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Abstract 

We define the common randomness assisted capacity of an arbitrarily varying channel 
(AVWC) when the Eavesdropper is kept ignorant about the common randomness. We prove 
a multi-letter capacity formula for this model. We prove that, if enough common randomness 
is used, the capacity formula can be given a single-shot form again. 

We then consider the opposite extremal case, where no common randomness is available, and 
derive the capacity. It is known that the capacity of the system can be discontinuous under 
these circumstances. We prove here that it is still stable in the sense that it is continuous 
around its positivity points. We further prove that discontinuities can only arise if the legal 
link is symmetrizable and characterize the points where it is positive. These results shed new 
light on the design principles of communication systems with embedded security features. 
At last we investigate the effect of super-activation of the message transmission capacity 
of AVWCs under the average error criterion. We give a complete characterization of those 
AVWCs that may be super-activated. The effect is thereby also related to the (conjectured) 
super-activation of the common randomness assisted capacity of AVWCs with an eavesdrop¬ 
per that gets to know the common randomness. 

Super-activation is based on the idea of “wasting” a few bits of non-secret messages in order 
to enable provably secret transmission of a large bulk of data, a concept that may prove to 
be of further importance in the design of communication systems. In this work we provide 
further insight into this phenomenon by providing a class of codes that is capacity-achieving 
and does not convey any information to the Eavesdropper. 
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1 Introduction 

Just like in our previous work [38], we investigate a model on the intersection between the 
two areas of secrecy and robust communication in information theory: the arbitrarily varying 
wiretap channel (AVWC). The communication scenario is depicted in Figured) 

In this model, a sender (Alice) would like to send messages to a legitimate receiver (Bob) over 
a noisy channel. Involved into the scenario are two other parties: a jammer (James) who can 
actively influence the channel and a second but illegitimate receiver (Eve). Alice’s and Bob’s 
goal is to achieve reliable and secure communication: 

First, Bob should be able to decode Alice’s messages with high probability (with respect to the 
average error criterion) no matter what the input of James is. 

Second, the mutual information between the messages and Eve’s output should be close to zero. 
Again, this has to be the case no matter what the input of James is. 

Like in our previous work, we add the option of Alice and Bob having access to perfect copies 
of the outcomes of a random experiment Q (a source of common randomness). While in our 
previous work [38] we considered the case where Eve gets an exact copy of the outcomes received 
by Alice and Bob, we now extend our study to the case where Eve remains completely ignorant. 
The only party which has no access to G in all the scenarios we study is James. We call the 
capacities which we derive from the two scenarios the “correlated random coding mean secrecy 
capacity” if Eve has information about G and “secret common randomness assisted secrecy 
capacity” if Eve has no information about it. When no common randomness is present at all, 
we speak of the “uncorrelated coding secrecy capacity”. For the sake of an extended discussion 
of secrecy criteria we also define a “capacity with public side-information” which is the data 
transmission benchmark for systems where Eve gets to know a part of the messages. 

From now on, we use the label Cg for the uncorrelated coding secrecy capacity (when no shared 
randomness is available between Alice and Bob) and C*™®™ for the correlated random coding 
mean secrecy capacity (just as in our previous work |38] we restrict attention to the case where 
common randomness is used. To the reader which is not familiar with that work we apologize, as 
some of our results rely on that previous work). The secret common randomness assisted secrecy 
capacity is labelled Ckey and the capacity with public side information Cpp. As is depcited in 
Figured) it is of vital importance that Eve cannot communicate to James. 
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Eigure 1: Secure coding schemes for correlated random coding (left) and secret common ran¬ 
domness assisted coding (right) 


We give a unified treatment of the subject which allows us to observe the behaviour of the 
system while we change the amount of and the access to the common randomness: for common 
randomness set to zero one observes instabilities of the system (in the sense that the capacity is 
not a continuous function of the channel parameters anymore) and the effect of super-activation. 
Roughly speaking, two channels show super-activation when each of them cannot be used for 
a certain task (e.g. reliable communication under average error, maximal error or zero error 
criterion or, as in this work, secure communication) alone, but if a joint use is allowed the task 
becomes feasible. A more precise formulation is given in equations ([5]) to ([7]), while the definition 
is part of Definition [11] which is followed by a short discussion of super-activation in the scenario 
treated here. If common randomness is used between Alice and Bob but Eve gets to know it 
as well, it is known from the results in [38] that already small (a logarithmic number of bits, 
counted in block-length) amounts of common randomness resolve the instabilities (in the sense 
that the correlated random coding capacity is a continuous function of the channel parameters). 
It remains unknown whether super-activation is possible when common randomness is present, 
and this question is the content of Conjecture [1] 

The full advantage from common randomness can only be gained if Eve is kept ignorant of it. 
If common randomness is used at a nonzero rate, this rate adds linearly to the capacity of the 
system. All the capacity formulas which can be proven to hold in the various nontrivial scenarios 
are given by multi-letter formulae. Only if the common randomness exceeds the maximal amount 
of information which can be leaked to Eve do we recover a single-letter description. At that 
point, the linear increase in capacity stops: In order to carve out these principal features of secure 
data transmission in a both exact and elegant mathematical framework we let the number n of 
channel uses go to infinity. 

We will now sketch the connections of our work with some of the highlights and landmarks in 
the earlier literature. While we do not attempt to work in full rigour in the introduction, we 
will nonetheless gradually introduce some mathematical notation. 

The probabilistic law which governs the transmission of codewords sent by Alice and jamming 
signals sent by James to Eve and Bob is, for n channel uses, given by 

n 

rc®"(y"|x",s")u®"(z"|x",s") = Ylw{yi\xu Si)v{zi\xi, Si) 

Here, s” = (si,...,s„) are the inputs of James, x'^ = {xi,... ,Xn) those of Alice and = 
{zi,...,Zn) the outputs of Eve, while y” = {yi,... ,yn) are received by Bob. All letters are 


( 1 ) 


3 






Ckey(2B,53,G) 



Figure 2: Scaling of secrecy capacity with the rate G of secret common randomness. It holds 
X = C™®“(2n, T) — (7™®“ (211,23), where T is defined below after equation ([T]). 

assumed to be taken from finite alphabets. The action of the channel is, for each natural number 
n and therefore also as a whole, completely described by the pair (W, V) of matrices of conditional 
probabilities and this could rightfully be called an interference channel with non-cooperating 
senders and receivers. With respect to the historical development we will nonetheless prefer to 
use a description via the pair (211,23) = ((rc(-|-, s))se 5 , (u(-|-, s))se 5 ) and the label “AVWC”. 
This model has two important restrictions which are widely known: The case where 23 does not 
convey any information about either one of its inputs is the arbitrarily varying channel (AVC). 
We will denote this special channel by T = (T), where t{z\x, s) = -^ for all z, x and s. Before we 
give some credit to the historical developments in the area, we would like to emphasize that the 
notion introduced in dH) extends to products of arbitrary channels from Xi to Ti and X 2 to T 2 j let 
them be denoted Wi and W 2 with respective transition probability matrices {wi{y\x))xeX,yGy 
and {w 2 {y\x))yey^xGX as follows: The transition probability matrix of Wi 0 W 2 is defined by 
w{yi,y 2 \xi,X 2 ) := Wi{yi\xi) ■ W 2 {y 2 \x 2 ) (for all xi 6 Ai, X 2 e X 2 , yi e Ti and 7/2 e ^ 2 )- 
The notation then carries over to arbitrarily varying channels, where we set 

2130 213' := (W, 0 (2) 

The model of an arbitrarily varying channel has been introduced by Blackwell, Breiman and 
Thomasian m in 1960. They derived a formula for the capacity of an AVC with shared 
randomness-assisted codes under the average error criterion, and we will restrict our discussions 
to this criterion, although important nontrivial results concerning message transmission under 
the maximal error criterion have been obtained e.g. in In [T] it was shown that an 

explicit formula for the (weak) capacity of an AVC under maximal error criterion would imply 
a formula for the zero-error capacity of a discrete memoryless channel. The latter problem is 
open now for half a century. 

In [2], Ahlswede developed an elegant and streamlined method of proof that, together with the 
random coding results of m, enabled him to prove the following: the capacity of an AVC (under 
the average error probability criterion) is either zero or equals its random coding capacity. This 
dichotomic behaviour is extended in the present work to the case where there is a (nontrivial) 
eavesdropper that has access to the shared randomness. 

After the discoveries made in [2], an important open question was, when exactly the deterministic 
capacity with vanishing average error is equal to zero, and in some sense the corresponding 
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question for the AVWC is left open by us as well. In 1985, a first step towards a solution was 
made by Ericson m. who came up with a sufficient condition that was proven to be necessary 
by Csiszar and Narayan [2^ in 1989. 

The condition which was developed by Ericson, called symmetrizability, reads as follows: An 
AVC 2IJ is called symmetrizable if there is a set {u{-\x))xex of probability distributions on S 
such that for every x,x' e X and y e y we have 

^ u{s\x)w{y\x', s) = ^ u{s\x')w{y\x, s). (3) 

seS seS 

An arbitrarily varying channel 21J that is symmetrizable cannot be used for reliable transmission 
of messages, as any input x can, at least in an average sense, be made to look as if it had 
been another input x'. An example for a symmetrizable AVC that cannot be used for reliable 
transmission of messages just by using one encoder-decoder pair but still has a positive capacity 
for correlated random codes was given in [12] and later used again in jS] Example 1]. This 
exemplary AVC also serves as an important ingredient to the super-activation results in m 
and is, as an important example, also to be found in Remark [7] of this document. 

On the technical side, this work makes heavy use of the results that were obtained in the 
work |22| by extending one of their central results to the situation where Eve gets some 
information via V. Namely, we are able to prove the following: If 211 is non-symmetrizable, 
then 03 ( 211 , 23 ) = C'™®|^(2n, 23) for all possible 23. We do not attempt to give a necessary and 
sufficient condition for Cs to be positive, since a geometric characterization in the spirit of the 
symmetrizability condition [3] is not even known for the usual wiretap channel. Rather, when 
speaking about the wiretap channel one usually refers to the concept of “less noisy” channels 
that was developed in m- 

The wiretap channel has been studied widely in the literature. The analysis started with the 
celebrated work [ID] of Wyner, an important follow-up work was [21j . by Csiszar and Korner. 
While Wyner only treated the degraded case, Csiszar and Korner derived the capacity for the 
general discrete memoryless wiretap channel. The wiretap channel in the presence of common 
randomness which is kept secret from Eve (in this scenario, one could equally well speak of a 
secret key) was studied by Kang and Liu in [30] . 

In recent years there has been a growing interest in more elaborate models which combine insuf¬ 
ficient channel state information with secrecy requirements. Probably the earliest publications 
which came to our attention are the work [34] by Liang, Kramer, Poor and Shamai and the 
paper m by Bloch and Laneman. Shortly after, the papers [9] and m by Bjelakovic, Boche 
and Sommerfeld got published. The work [9] provides a lower bound on the secrecy capacity of 
the compound wiretap channel with channel state information at the transmitter that matches 
an upper bound on the secrecy capacity of general compound wiretap channels given provided 
in [M], establishing a full coding theorem in this case. Important contributions of the work 
m are a lower bound on what is called the “random code secrecy capacity” there, as well as a 
multi-letter expression for the secrecy capacity in the case of a best channel to the eavesdropper. 
The approach taken in this publication is closely related to the one taken in m, but the use of 
different proof techniques enables us to provide much stronger results. An interesting parallel 
development is the work [29] by He, Khisti and Yener studies a two-transmitter Gaussian 
multiple access wiretap channel with multiple antennas at each of the nodes. A characterization 
of the secrecy degrees of freedom region under a strong secrecy constraint is derived. 

A surprising result that was discovered only recently by Boche and Wyrembelski in [16] is 
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that of super-activation of AVWCs. We will explain this example in more detail in Remark 
[71 This effect was until then only known for information transmission capacities in quantum 
information theory, where it was proven by Smith and Yard in [36] that there exist channels 
which have the property that each of them alone has zero capacity but the two together have a 
positive capacity. 

Before the work m this was assumed to be an effect which only shows up in quantum systems, 
where it was observed e.g. in |36j . 

The work [16] gave an explicit example of super-activation which we repeat in Remark [71 but 
a deeper understanding of the effect was not achieved. Based on our finer analysis, we are 
now able to provide the following results: First, we give a much clearer characterization of 
super-activation of the uncorrelate(^ coding secrecy capacity in Theorem [2l Second, and more 
for the sake of a clean discussion of coding and secrecy concepts, we define the capacity Cpp 
which explicitly keeps a part of the messages public (such that it may be that Eve is able to 
decode them). We do not attempt to give a further characterization of Cpp in this work, but 
we show that this capacity does as well show super-activation by use of the code concepts that 
were developed in m- Details are given in Subsection EH together with the exact definition 
of Cpp. 

We will now give a broad sketch of our results concerning Cs and C™®“, before we 
start concentrating on Ckey. R was proven in [38] that is a continuous quantity, and 

while the statement may seem trivial at first sight, it becomes highly nontrivial when the 
following are taken into account: 

There is at least no obvious way to deduce this statement directly just from the definition of 
the capacity, without first proving a coding result, and the latter route was taken in [38], where 
an explicit formula for C™®|^ was found: 

Cs^ran (2R, 23) = lim — max max ( min I(p; VF®"'o C) — max/(p; o [/) ) . (4) 

S.ranV , 7 n peViUn) UeCiU„,X^) \qeViS) ^ J 

Explicit bounds on |W„| were given as well. While one may argue that this is not an efficient 
description since one is forced to compute the limit of a series of convex optimization problems, 
it turns out to be an incredibly useful characterization in the following sense: Eirst, it enables 
one to prove that C™®“ is a continuous function in the pair (OR, 53) and this result was obtained 
in [38]. 

As has already been pointed out in m, the continuous dependence of the performance of a 
communication system on the relevant system parameters is of central importance. To give just 
one example, consider recent efforts to build what is called “smart grids”. Such systems do 
certainly have high requirements both concerning reliability and stability of the communication 
in order to avoid potentially damaging consequences for its users. 

While it is very interesting from a mathematical point of view, it certainly comes as an unpleas¬ 
ant surprise then that Cs does not grant us the favour of being a continuous function of the 

^Note that, due to the presence of an eavesdropper, it makes sense to allow the use of randomized encodings. 
Using, in such cases, the term “random code” is much too imprecise due to the potential presence of shared 
randomness between sender and receiver. Thus, we prefer to use the term uncorrelated codes. The random choice 
of codewords within an uncorrelated code represents lack of knowledge both for Eve and James. Analysing the 
case where James gains additional knowledge provides an interesting research opportunity, but care has to be 
taken when modelling the information flow from James to Eve. 
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channel. On the other hand, this casts a flashlight on the importance of distributed resources in 
communication networks - in this case the use of small amounts of common randomness. While 
one may now be tempted to think that the transmission of messages over AVWCs without the 
use of common randomness is a rather adventurous task, we are also able to prove that such a 
perception is wrong: Our analysis shows that Cs is continuous around its positivity points (this 
has been observed for classical-quantum arbitrarily varying channels in |15j already), and we 
are able to give an exact characterization of the discontinuity points as well. An example of a 
point of discontinuity has been given in |18j . 

Moreover, our characterization of discontinuity relies purely on the computation of functions 
which are continuous themselves, so that a calculation of such points is at least within reach 
also from a computational point of view. 

Further, the deep interconnection between continuity and symmetrizability which shows up 
in our work enables us to give a characterization of pairs (2Ili,23i) {i = 1,2) for which super¬ 
activation is possible only in terms of In order to be very explicit about super-activation, 

let us note the following: 

The inequality 


Cs(2Bi 0 2B2,^10 512) ^ Cs(2Bi,TJi) + Cs(2B2,2J2) (5) 

follows trivially from the definition of C. It is common to all notions of capacity which are 
known to the authors. In contrast, if the inequality 

Cs(2IJi 0 ^ 2,^1 0 ^ 2 ) > Cs(2IJi, TJi) + Cs(2IJ2, TJ 2 ) (6) 

holds, we speak of super-additivity and only if we can even find AVWCs (2Ili, 23i) and {^ 2 ,^ 2 ) 
such that we have 

Cs(2IJi,QJi) = Cs(2 IJ2,5J2) = 0, but Cs(2Bi02 IJ2,5 Ji0 TJ2) > 0 (7) 

we speak of super-activation. 

While it is clear from explicit examples in that super-activation of Cs is possible, it turns out 
in our work via Theorem [5] that the effect is connected to the super-activation of C™®™, if the 
latter occurs. We would therefore like to take the opportunity of spurring future research by 
stating the following conjecture: 

Conjecture 1. There exist pairs (2ni,QJi) and {^ 2 ,^ 2 ) of (finite) AVWCs such that 

Cs“®“(2Bi, TJi) = Cs7-(2Bi, QJi) = 0, (8) 

but 

Cs“®r(SUi 0 2 IT 2 ,0 TJ 2 ) > 0. (9) 

An initial definition of objects such as 21110 2 B 2 has been given in equation ([2]) and repeated 
again in Subsection 12.21 As a last introductory statement concerning super-additivity, let us 
mention the connection of super-activation to information transmission in networks: Consider 
two orthogonal channels in a mobile communication network. Not taking into account the 
issues on the physical layer, on may end up in a description of these channels via 21 Ji, 2 B 2 from 
Alice to Bob and 2Ji,232 from Alice to Eve. The surprising result then is that, while it may be 


7 


completely impossible to send information securely over each one of them, there exist coding 
schemes which enable Alice to send her information securely if both she and Bob have access 
to both 21Ji and 2112 ! 

We will argue later in Subsection 12.21 how this effect works for the capacity Cpp. While this 
capacity offers an insightful view on the topic, we nevertheless concentrate on the interplay 
between Cs, and C^ey in this work. 

Let us now switch our attention to further results presented in this work. As mentioned 
already, we also extend earlier research to the case where lots of common randomness can be 
used (exponentially many random bits, to be precise) during our investigation of Ckey We 
do not dive into the issues arising when sub-exponentially many random bits are available, 
although the repeated appearance of the activating effect of common randomness in arbitrarily 
varying systems seems to deserve a closer study. Our method of proving the direct part does 
again yield nothing more than the statement that any number of random bits which scales 
asymptotically as const. + (1 + e) log(n) (for some e > 0) is sufficient for evading all issues which 
may arise from symmetrizable 211. 

Our restriction to positive rates G of common randomness allows us to give an elegant formula 
for Ckey as follows: For every G > 0, it holds 

Ckey(2B, 23, G) = min{Gs7“(2Il, 23) + G, Gs7“(213, T)}. (10) 

Here, T denotes the AVC consisting only of the memoryless “trash” channel T mapping ev¬ 
ery legal input x and jamming input s onto an arbitrary element of Z with equal probability 
{t{z\s,x) = |.Z|“^). While the reader familiar with the topic would certainly have guessed the 
validity of a formula of this form it is worth noting that this formula is generally “hard to 
compute” in the sense that it requires one to calculate the limit in the formula (j4]) - as long as 
G < G^®f))(2B, T) - 23). If this condition is not met, then Gkey(223,23) = G^®“(213, T). 

Since the latter is the usual capacity of the AVC 213, we conclude the following: If enough com¬ 
mon randomness is available, the capacity of the system can be much more efficiently described 
- by a formula which does not require regularization anymore! 

Again, a look into the area of quantum information theory shows a striking resemblance: The 
capacity formula for the usual memoryless quantum channel has been proven to be given by 
regularized quantities in the general cases, both for entanglement transmission and for message 
transmission. Without going into too much detail about quantum systems we cite here the work 
[23j by Devetak as our main reference underlining this statement, although this work has been 
both preceded and followed by important results dealing with the topic. 

Apart from specific classes of quantum channels which were shown to have non-regularized ca¬ 
pacity formulae |24j by Devetak and Shor, it has also been proven that the entanglement assisted 
capacity for message transmission over quantum channels is given by a one-shot formulae [8] by 
Bennet, Shor, Smolin and Thapliyal. 

To the best of our knowledge, a quantification of the amount of entanglement assistance which 
is necessary in order to turn the capacity formula into a one-shot formula has not been given 
yet. 
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2 Notation and Definitions 


This section contains notation, conventions, as well as operational definitions and technical 
definitions 

2.1 Notation and Conventions 

In the context presented in this work, every finite set will equivalently be called an alphabet. 
Such alphabets are denoted by script letters such as A, B, S, X, y, Z. The cardinality of a set 
A is denoted by |^|. Every natural number N e 'N defines a set [A'"] := N}. The set of 

all permutations on such [A^] is written S^. The function exp : R ^ R+ is defined with respect 
to base 2: exp(t) := 2*. The logarithm log is defined with respect to the same base. For any 
c e R we define |c|“'' by setting IcI"*" := c if c > 0 and IcI"*" := 0 otherwise. A function / : ^ ^ R 
is nonnegative (/ ^ 0) if /(a) ^ 0 holds for all a e A. To each finite set A we associate the 
corresponding set V{A) := {p : A —>■ [0,1] '■ P ^ 0; Xlae.Al'(®) = 1} of probability distributions 
on A. Each random variable A with values in A is associated to the unique p e "P^A) satis¬ 
fying P(A = a) = p{a) for all a e A. An important subset of V{A) is the set of its extreme 
points. Every such extreme point is a point measure 5a[o!) := 5[a,a') where 5(-,-) is the usual 
Kronecker-delta. The one-norm distance between two probability distributions p,p' e 'P(A) is 

b-p'lli = HaeA b(a) -P'ia)\. 

The expectation of a function / : .4, ^ R with respect to a distribution p e V(A) is written 
^pf if P i® clear from the context, simply Ef. 

For each alphabet A and natural number n e N we can build the corresponding product alphabet 
A^ := .4 X ... X where x is the usual Cartesian product and there are exactly n copies of A 
involved in the definition of A^. The elements of A^ are denoted = (oi,..., a^). Each such 
element gives rise to the corresponding empirical distribution or type lV(-|a”') e PiA) defined 
via N{a\a'^) := |{f : = a}| and iV(-|a"') := ;iAI(-|a"^). Given A and n e N, the set of all 
empirical distributions arising from an element a” e A^ is Pq{A) := {iV(-|a"') : e A^}. Each 
type p e Pq{A) defines the typical set Tp := {a"' : N{-\a'^) = p{-)}. 

Channels are given by affine maps W : P{A) P{B). The set of channels is denoted 
C{A,B). Every channel is uniquely represented (and can therefore be identified with) its set 
{w{b\a)}aeA,bei3 of transition probabilities, which are defined via w{b\a) := W{5a){b). It acts as 

W{p) := SE w{b\a)p{a)5b, (II) 

aeA beB 

where both W{p) e P(B) and (S^jbeB P{B) (another way of writing the above formula would 
be to set W{p){-) = u>(6|a)p(a)(5b(-) or even W{p){y) = 'Laej,w{b\a)p{a)). As a 

shorthand, we may occasionally also write Wp to denote W{p), in analogy to linear algebra 
(every channel is naturally associated to its representing stochastic matrix {w{a\b))a,b and can 
therefore be extended to a linear map on the appropriate vector spaces). 

When operating on product alphabets such as .4 x .6 we define p 0 q ^ P{A x B) to he the 
distribution defined by {p 0 q){a,b) := p{a)q{b). Correspondingly, p®” e 'P(.4"') is defined via 
p®^{an) := same conventions hold for channels: if V : P{A) —>■ P{B) and 

W : P{A!) P{B'), then V 0 W : P(A x A!) P{B x B') is defined via its transition 

probabilities as (u 0 tc)((6,6')|(a, a')) := v{b\a)w{b'\a') and the notation carries over to n-fold 
products VE®"' of W : P{A) P{B) as before by setting t(;®"'(6"'|a"') := nr=i 
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For channels W e C(Ax B,C) it will become important to derive a short notation for cases where 
one input remains fixed while the other is arbitrary. Such induced channels will be denoted, in 
case that this is unambiguously possible, by Wp where 

Wp{5a):=W{5a®p). (12) 

At times it will, in order to straighten out notation, also be necessary to write the transition 
probabilities as either Wp{b\a) or even w{h\a,p). 

The Shannon entropy of p e V^A) is H{p) := — Xiae.4P(^) relative entropy between 

two probability distributions p,q ^ is D{p\\q) := Xjae.4P(®) ios(^'(®)/9(®))’ i^ = 0 ^ 

p{a) = 0 for all a e A^ and D{p\q) := +oo, else. 

Every p e ^{A) and channel W : V{A) V{B) define a joint random variable which we 
call {A,B) for the moment and which is defined via = (a, 6)) = p{a)w(b\a) (for all 

a e A, b e B). This enables us to use an equivalent formulation for the mutual information: 

I{p-W):=I{A-,B). (13) 

A more operational definition of this quantity can be achieved by noting that the distribution 
of {A,B) in this scenario arises from defining p^^^ e V^A x A) by p^‘^\a,a') := p{a) ■ 5a{a!) for 
all a^a' B A - it then holds P((A, B) = (a, b)) = {{Id 0 W)p^‘^^){a, b) for all a e 6 e B. The 
operational interpretation of this probability distribution is that Alice observes the outcomes a 
of some random process. Given any such outcome, she makes one copy of it and sends that copy 
over to Bob via the channel IF, keeping the original data with herself. 

We will go one step further and define mutual information on pairs of sequences e A^, 
j^n g (defining a random variable {A,B) with values in A x B via P{{A,B) = 

(a, 6)) ;= N{a,b\a^ib'^) and then setting 

I{a^-b^)-.= I{A-B). (14) 

In addition, we will need a suitable measure of distance between AVWCs. Our object of choice 
is the Hausdorff distance which we define as follows: For two channels IF, IF e C{A, B), set 

||1F - 1F|| := max ||lF((ja) - W{5a)\\. (15) 

aeA 

Now we define for a given 21J = (lFs)sg 5 , and 211' = (1 F^/)s'g5' 

g{W,m') := maxmin ||1F, - 1F;||. 
seS s'eS' 

Then we can ultimately define 

ci((2n,2J),(21J',Tr')) := max{5(21J0 2J,2n'0Tr'),5(211'0 2J',2IJ0Tr)}. (16) 

This is a metric on the set of finite-state AVWCs with the corresponding alphabets A,B,C. 
Another ingredient in the following is the notion of the convex hull of a set of channels, which 
can for e.g. AVCs 2IJ = (lFs)sg 5 be defined as 

conv(2IJ) := jlF = 2 q{s)Ws : q e P(5)| . (17) 

At last, we would like to mention that for any given IF e C{A, B), a e A and subset B' B we 
use the notation 

w{B'\a) := ^ w{b\a). (18) 

beB' 


10 



2.2 Models and operational definitions 

At first, we give a formal definition of an arbitrarily varying channel. This extends our informal 
definition from the introduction, without any change in notation. 

Definition 1 (AVWC). Let X, y, Z, S be finite sets and for each s e S, let Wg e C{X,y) 
and F, e C{X,Z). Define W := (IT, )seS o,nd ;= (T,)se 5 - The corresponding arbitrarily 
varying wiretap channel is denoted (2B, QJ). Its action is completely specified by the sequence 
whece lT,n := 0 ... 0 Ws^ and := 0 ... 0 


Remark 1. The AVWC (211,23) can equivalently be represented by defining W e C(S x X,y) 
via w{y\x,s) := Ws{y\x) and V e C(S x X,Z) via v{z\x,s) := Vs{y\x). We will use both 
representations interehangeably. 

Whenever necessary, we will (for n e N and q e V(S^)) also use the abbreviations 

(19) 

s"e5" s"eS^ 

and the eorresponding conditional probabilities are defined in the obvious way for all x"' e X'^, 

yn ^ yn^ ^ 

wf^{y^\x^) := Wf^{d,n){y-), vf^izV) ■= Vf"(4^)(^"). (20) 

Since a central part of our work is to study AVWCs under joint use, we have to carefully 
define what we mean here with “joint use”. Let (2131,23i) and be two AVWCs. Since 

state alphabets are finite in all of our work, we will without loss of generality assume that they 
have a joint state set <S. We then define 

(2131 0 2232,2310 232) := ((lTi(-|-, s) 0 W 2 (-|-, s'), ^i(i, 5) ® ^ 2 (-h s')).,.'e5- (21) 

We now come to a more “classic” topic; The definition of codes, rates and capacities. From 
the start, we will include the possibility of adding some extra variables like shared randomness 
or common randomness between Alice and Bob, but also the possibility for Alice to divide her 
message set into two parts: One which is to be kept secret from Eve and one which does not 
necessarily have to remain secret. 

We introduce three different classes of codes, which are defined in the following and related to 
each other as follows: The class of shared randomness assisted codes contains those which use 
common randomness and these again contain the uncorrelated codes. Formal definitions are as 
follows: 


Definition 2 (Shared randomness assisted code). A shared randomness assisted code K-n for 
the AVWC (211,23) consists of: a set [K] of messages, two finite alphabets [r],[r'] and a set 
of stochastic encoders e'*' e (^([iF], A"') (one for every value 7 e [TjJ together with a collection 
{{Dl )(Li)y^]^ of sets satisfying IJ^i^ = 0 k A k' and for 

each 7 '. In addition to that, there is a probability distribution fj, e P([r] x [T']). Every such 
code defines the joint random variables := (.^, .^(^, b^, 3 s", ^s") (s^ e which are 

distributed according to 


P(6,n = (/c,/c',7,7',2;’' 


,x0y^ 




y 


jWgn 




\X U," 


(0 


( 22 ) 
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The average error of ICn is 


err(/Cn) = 1 — max 


S 


fc,7,7'=l 


K 


2 e^x^\k)wsr^iD2'\x^). 
x”eX'^ 


(23) 


Definition 3 (Common randomness assisted code). A common randomness assisted code for 
the AVWC (2II, 23) consists of: a set [K] of messages, a set [F] of values for the common random¬ 
ness and a set of stochastic encoders e^ e C'([iF], 3F"’) fone for each element 7 6 [FJJ, together 
with a collection of subsets D'^ of satisfying Dj. n D'^, = 0 for all 7 e [F], when¬ 
ever k ^ k'. Every such code defines the joint random variables Ggn := (^,^7 b„, 3s’») 

(s^ e S'^) which are distributed according to 


P(6sn = {k,k' ,j,x"-,y"-,z0) = 
The average error of K-n is 


e\x^\k)lj,.^{y0Ws^{yVM^^\x0 (24) 

i • A k' 


err(/C„) = 1 - mjnc ^ ^ ^ ^ e'^{x^\k)ws^{Dl\x0. (25) 

For technical reasons we also define, for all state sequences s", the corresponding average success 
probability of the code by 


dAKn] - 7 ^ E E r'{x’'\k]wADl\x''). (26) 

fc,7=l3;"EA’" 

One particularly interesting feature of AVCs is that it may be impossible to transmit any 
whatsoever small number of messages reliably from Alice to Bob without using shared random¬ 
ness - but if one is willing to only spend a polynomial amount of common randomness, the 
capacity of the channel jumps to the maximally attainable value, an effect which was discovered 

in [2]. 

If a whole communication network is being utilized it may be possible to use one part of the 
network to establish common randomness between the legal parties (one could equally well speak 
of a secret key here) which is then used to send messages over another part of the system which 
may be symmetrizable. This idea was first established in m- In this work, we will give a 
more careful analysis of the underlying structure, an undertaking which motivates the following 
definition: 


Definition 4 (Private/public code). A private/public code Kin for the AVWC (213,23) consists 
of: two sets [iF], [L] of messages, an encoder E e C'([iF] x [L],A’"’), and a collection 
of subsets of satisfying D^i n = 0 whenever {k,l) A {k',1'). Every such code defines 
the joint random variables := (.fi, £, .fi', X”, 3 s") (s"^ e which are distributed 

according to 


I 


P( 6 s" = {k,l,k' ,1' ,x^,y'^,z0) = -rr—ro{x^\k,l)lDu,,{y'^)ws^{y'^\x0Vsr^{z^\x0. 


K • L 


(27) 


The average error of Kn is 


K,L 

err(/C„) = 1 - jnax ^ ^ j^-^e{x'^\k,l)wsn{Dk^i\x0. 

feJ=la;"sA’" 


(28) 
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With this definition we can formalize the idea of “wasting” a few bits in order to guarantee 
secret communication. We would like to compare this approach to the case of a compound 
channel, where a sender that knows the channel parameters can send pilot sequences to the 
receiver in order to let him estimate the channel. The pilot sequences do not carry information 
from sender to receiver. With such a scheme, a sender with state information can transmit 
at strictly higher rates than one without. The higher capacity is reached by “wasting” some 
transmissions for the estimation. Since the number of channel uses that have to be used for 
estimation grows only sub-exponentially in the number of channel uses, there is no negative 
impact on the message transmission rate in asymptotic scenarios. 

In the case treated here it turns out that sending a small amount of non-secret messages is the 
key to increase the secrecy capacity in specific situations. We would like to extend the formal 
background of this idea by allowing for a joint transmission of secret and non-secret messages: 

Definition 5 (Private/public coding scheme). A private/public coding scheme operating at rates 
(/Zpri, i2pub) eonsists of a sequence {K-n)nen of private/publie codes such that 


lim err(/C„) = 0 , 

n—*^cc 

(29) 

liminf-log(A:„) = Rpn, 
n^oo n 

(30) 

liminf-log(L„) = Rp^h, 

n—*^(X) n 

(31) 

lim sup max /(.^; -Cn) = 0 . 

_T3. C*rj. ' ' ' 

(32) 


^ o'H,/- Cn \ I / \ / 

n—>00 -5 

A more restricted class of codes arises when there is only one type of messages, which ought 
to be kept secret, and in addition allows the use of common randomness. 

Definition 6 (Common randomness assisted coding scheme satisfying mean secrecy criterion). 
A common randomness assisted coding scheme satisfying the mean secrecy criterion operating 
at rate R consists of a sequence (/C„)„gN of common randomness assisted codes such that 

lim err(/C„) = 0, (33) 

n—>00 

lim inf — log(iP„) = R, (34) 

n—*’CC Ti 

lim sup max I{An]f)s^\bn) = 0. (35) 

n—>00 s '^ gS '^ 

Note that both Definition [5] and Definition [ 6 ] require the mutual information between the 
secret messages and the output at Eve’s site to be small on average, either over the public 
messages or over the common randomness. One may argue that this is a somewhat weak 
criterion. In our earlier paper [38] we compared the capacity arising from the use of coding 
schemes under Definition [ 6 | to a capacity derived under more severe requirements on the secrecy 
criterion. We were able to demonstrate that the respective capacities coincide. It is not known 
to us whether a more strict requirement in Definition [5| would lead to a different capacity. 

Definition 7 (Secure uncorrelated coding scheme). A secure uncorrelated coding scheme oper¬ 
ating at rate R consists of a sequence (/Cn)neN of common randomness assisted codes with r„ = 1 
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for all n e N such that 


lim err(/Cn) = 0, 

(36) 

n—^oo 

lim inf — log(ifr„) = R, 

(37) 

n—*‘O0 Ti 

lim sup max /(fi„;3s'i) = 0. 

n^oo 

(38) 


Definition 8 (Secure coding scheme with secret common randomness). A secure eoding scheme 
with seeret common randomness A operating at rate R and using an amount > 0 of eommon 
randomness consists of a sequence A. := {JCn)neN of common randomness assisted codes with 
lim„^oo ^ logTn = Gsi such that 


lim err(/Cn) = 0, (39) 

n—*^cc 

lim inf — log(J„) = R, (40) 

n^co Tl 

lim sup max I(.fin;3s") = 0. (41) 

n->oo s"e<S" 

Remark 2. The reader may wonder why the common randomness is only being quantified for 
secrecy schemes where the common randomness is kept secret. The reason for this becomes clear 
when reading m where it is proven that any shared randomness needed in order to achieve 
the correlated random coding mean secrecy capacity can always be assumed to not be larger than 
polynomially many bits of common randomness. These small amounts are not counted in the 
definition of the respective capacity. This result from got applied in our earlier paper \38f 
as well. 


Since we completely restrict our analysis to the case where the system uses common ran¬ 
domness, we can spare a few indices to distinguish the different sources of external randomness: 

Definition 9 (Secrecy capacities). Given (213,01), we define for every G > 0 the secret common 
randomness assisted secrecy capacity as 


Ckey(2H,23,G) := sup^^R: 


There exists secret common randomness assisted 
coding scheme A at rate R with G^ = G 


(42) 


The uncorrelated coding secrecy capacity and the correlated random coding mean secrecy capacity 
are defined just as in fSBIj: 


There exists a secure uncorrelated 
coding scheme operating at rate R 

There exists a secure common randomness 1 
C'^®“(213,23) := sup -j R : assisted coding scheme satisfying the mean > . (44) 

[ secrecy criterion operating at rate R J 



Cs(213,23) :=sup{R: 


We refrain from defining the rate region for private and public messages in this work, and 
restrict ourselves to consider only the boundary of that region that arises from letting Rp^h be 
arbitrarily small. This does for example allow us to transmit any finite number of messages, or 
numbers of messages that scale sub-exponentially in the number of channel uses. 


14 


Definition 10 (Private/public secrecy capacity). The private/public secrecy capacity is given 
by 


Cpp( 2 B,iU) :=sup 



There exists a private/public coding scheme at 
rates (-Rpub)-Rpri) such that R = Rpn 


(45) 


The above dehnition explicitly allows for the super-activation strategy of m to be used, 
and shall be explained using this example first. Before we do so, let us give the formal definition 
of super-activation: 


Definition 11 (Super-activation). Let (2ITi,TJi) and (2112,232) be AVWCs. Then (2Bi,2Ji), 
(2112,232) are said to show super-activation if Cs{Wi,^i) = Cs{W 2 ,^ 2 ) = 0 but Cg(2Ili ® 
2112,23i 0 232) > 0. 


Now set 223 := 223i 0 2232 and 23 := 23i 0 232 - In order to simplify the discussion, one may 
additionally set 232 = 2232 = (Id), where Id e C([2], [2]) and assume that 223i is symmetrizable 
but that C'™®|;(‘(223i,23i) = a > 0. It follows that C'pp(223i,23i) = Cpp(2232,232) = 0, because 
of symmetrizability and since decoding of the messages that are sent via (2232,232) is possible 
without any error both for Bob and for Eve. These messages may therefore be treated as 
common randomness that is known by Eve. We know that already with the choice 
we have enough common randomness to remove any effect arising from symmetrizability of 
223i. Since the code arising from the combination of sending and decoding public messages via 
{Id, Id) and private messages via (223i,23i) is a coding scheme that fits under Definition O we 
get Cpp(223i 0 2232 ,23i 0 232 ) ^ a > 0. 

That such a scheme does work as well when Cg is considered instead of Cpp can be understood 
as follows: 

Let two AVWCs (223i,23i) and (2232,232) be given. Let 223i be symmetrizable, but such that 
^™ran(2I3i, 23i) = Q: > 0. Since 223i is symmetrizable we have Cg(223i,23i) = 0. If no additional 
resources are available, the surplus a in the common-randomness assisted secrecy capacity cannot 
be put to use. Let now C'g(2232,232) = 0 but C'g(2232,T) = /3 > 0 (T denotes the trash channel, 
so this just means that it is possible to reliably transmit messages over 2232). Then 


C'g(223i 0 2232,2310 232) ^ a > 0 (46) 

and the reason for this effect is that (as before when we considered Cpp) a small amount of 
messages can be sent over 2232 and is then used as common randomness, therefore increasing the 
rate of messages that can be sent reliably over 223i from zero to a. Of course, the messages sent 
over 2232 can be read by Eve. That this causes no problems with the security requirements can 
be seen by defining a toy-model where only two parallel channels with respective adversarially 
controlled channel states are considered. This is done as follows: 

Let us define random variables = (221,221,3i,s, 32,s) where 

P(2I = {m,m, z, z)) = —^wi^s{z\rn,rh)w 2 ,s{z\rh) (47) 

and the channels {IVi^s}se<s and can be controlled by James separately. It is understood 

that m are the messages, whereas rh are the values of the shared randomness that is distributed 
between Alice and Bob by using {IV 2 ,s}-g^. We assume that for some small e ^ 0 we have 

max 1(221; 32,s I^) ^ c- (48) 

seS 
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Observe that 32,i depends solely on Tl via the channel W 2 ,s (this is where the fact that the two 
arbitrarily varying channels are used in parallel), so that the data processing inequality yields 
for every s, s that 

32,s) ^ im (49) 

It is a consequence of the independence between 9JI and SOI that we can (for every s and s) then 
continue this chain of estimates as follows: 

= + if(3i,s, ^) - h { ti , 3i,s, SOT) 

= + F(3i,.,SOT) - H{m,- H{m) 

= + if (3 m, SOT) - f(sot,3m>sot) 

= i7(SOT|SOT) + //(3 m|SOT) - /7(SOT,3i,.|SOT) 

= /(SOT;3m|SOT) 

^ e. 

Thus it is clear that, in addition, 

max_/(SUI;3i,s,32,s) ^ e. (57) 

It is also evident that this argument ceases to hold true when the channels that are used for 
transmission of M and of M ar not orthogonal anymore. Our sketch indicates why the protocol 
developed in [16] is able to meet the secrecy requirement in Definition [T] 

After we indicated why the super-activation protocol works we do now want to switch 
the topic and highlight a few connections to related problems and technical difficulties: 

It is evident from the existing literature on AVCs |5|, arbitrarily varying classical-quantum 
channels m and on the quantification of shared randomness [TiEslEIlllIllsg] that the latter is 
not an easy task. A brief overview concerning the connections between quantification of shared 
randomness and arbitrarily varying channels has been given in m- Our focus here is on systems 
that use only common randomness in various different ways. 

In our previous work [38] we developed a formula for The proof, extending the results 

established in m and m, displays clearly that already amounts of common randomness which 
scale polynomially in the blocklength n are sufficient for achieving the full random capacity. 
Moreover, an exact quantification of the amount of shared randomness is not necessary when 
speaking about correlated random coding mean secrecy capacity. Either no shared randomness 
is allowed in the sense that T^ = 1 for all n 6 N or else one allows arbitrarily large amounts of 
it but then only uses the above mentioned polynomial amount. 

With the functions G >—>■ C'key(2H, 23, G) the story is a different one, as the following 
interesting behaviour occurs: They are well-defined for all G > 0. However, when G = 0 they 
are not unambiguously defined anymore, as it is clearly possible to take e.g. a sequence (r„)„gN 
such that = n? for each n 6 N. In that case, G = lim^^oo ^ logT^ = 0, but the amount of 
randomness is sufficient in the sense that for every e > 0 there exists a sequence {JCn)nGN of 
codes which use only the common randomness T^, operate at a rate Re = 23) — e and 


(50) 

(51) 

(52) 

(53) 

(54) 

(55) 

(56) 
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are both asymptotically secure and satisfy lim„_>oo err(/Cn) = 0. Thus, purely from the mathe¬ 
matical definition of CkeyCOT, 53, G), one would be tempted to set C'key(2B, 53, 0) = C'™®|;^(213,53). 
However, from the operational point of view this is unsatisfying: imagine taking the statement 
“no common randomness” literally, and therefore setting r„ = 1 for all n 6 N. Let 213 be a 
symmetrizable AVC. In that case there is no chance to reliably transmit any whatsoever small 
amount of messages with = 1 for all n e N mi¬ 
lt is thus clear that C'™®“(2IJ, 53) = limc-^o C'keylSB, 53, G) holds, but that it at least seems to 
be a difficulty to give a both operationally meaningful and mathematically satisfying definition 
of C'key(233,53, 0) (see e.g. [37] for a possible approach to such type of problem). 


A quantity which will be proved to be of importance during our proofs and when quan¬ 
tifying how close an AVC is to being symmetrizable is defined as follows: We let Mfin denote 
the set of all finite sets of elements of C{X,y). 

Definition 12. The function F : Mfin ^ M+ is defined via setting, for each 213' = 
(W'(-|-,s)),e5eMfi„, 


F(213') 


max min 
UeC{X,S) x^x' 


^ u{s\x)W'{5xi ® (Js) - ^ u{s\x)W'{5x ® (5s) 

seS seS 


1 


(58) 


This function obviously has the property that for every AWVC 213', the statement T(213') = 0 
is equivalent to 213' being symmetrizable. 


3 Main Results 

In this section we list our main results. We start with a coding theorem concerning the secret 
common randomness assisted secrecy capacity whose direct part is based on our Lemma [T] that 
we state directly afterwards. We continue with a second and even more delicate lemma, which 
is an extension of mi Lemma 3 ] to AVWCs. This lemma (Lemma [2|) is important: it provides 
a direct (coding) part for Theorem [J] which addresses the influence of the symmetrizability 
condition ([3]) on the capacity Cs and thereby relates it to 

Our last result connects to the work |16] . which showed a very surprising effect that has so 
far not been observed for classical information-carrying systems: super-activation. We give a 
precise characterization of the conditions which lead to super-activation in Theorem [5l 

Theorem 1 (Coding Theorem for secret common randomness assisted secrecy capacity). Let 
(223,53) be an AVWC. For every neN, setUn := [iTp]. Define 

C*(223,53) := lim — max max ( min /(p; VL®"'o t/) — max /(p; o [/) V (59) 

n^co n peV{U„) UeC{Un,XD \qeViS) ^ s"e5" / 

It holds (with T = (T) denoting the AVC consisting only of the memoryless channel that assigns 
the uniform output distribution to every input symbol), 

Ckey(213,53,G) = min{C='(213,53) + G,C='(213,T)} (60) 

Of course, C'*(213, T) is the capacity of the AVC 223 under average error. This capacity has a 
single-letter description. Since the first argument in above minimum is not single letter, there is 
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room for speculation whether there is room for improvement in this characterization or, if not, 
for which value of G the description in terms of a single-letter quantity is possible and for which 
not. Apart from the complicated multi-letter form, an important take-away from the above 
formula is that the following is true: 

Corollary 1. For every G > 0, the function (2B, QJ) CkeyCSU, 53, G) is continuous. 

Remark 3. If G = 0 in the sense that r„ = 0 for all n 6 N, then for all AVWCs (213,53) we 
know that C'key(2Il, 53, G) equals Gs(213,53). 

We are getting closer to the technical core of our work now. The next Lemma is essential to 
proving the direct part of Theorem[TJ It quantifies how many messages L and how many different 
values r for the common randomness are needed in order to make the output distributions at 
Eve’s site independent from the chosen message k. 

Lemma 1. For every r > 0 there exists a value > 0 and an Nq{t) such that for all 

K L r 

n ^ Nq{t) and natural numbers K,L,T and type p e Vq{X) there exist codewords 

in Tp d and decoding sets D^i c obeying n = 0 if {k, 1) A {k', I'), such that we 

have: 

If i log(iL • L) ^ minggp( 5 ) /(p; Wq) — u{t) and T ^ then 


" 1 1 






If i log(L • r) ^ maxg /(p; Vg) + r then 

L,r 

E 

i,7=l 




(61) 


max ^ ^ 

s^,k L • r 


ft 

7^ 2 vA-\^ki,) -^Vs0-\X0 ^2—"W, 


(62) 


where is distributed according to P(A”' = x”) := the dependence of v on t 

is such that limT^o ^{t) = 0 . 


While Lemma [T] delivers the correct interplay between and scaling of the size of the numbers 
of secret messages K, the number of additional messages L that are just being sent in order to 
obfuscate Eve and the number of values for the (secret) common randomness E that are being 
used up in the process, it is insufficient for dealing with the case when E is set to one or is kept 
very small. For those cases where the secret or partially secret common randomness E is set to 
one for every number of channel uses, we have to deal with the symmetrizability properties of 
the legal link W from Alice to Bob. Initial statements in that case are as follows: 


Theorem 2 (Symmetrizability properties of Gs). Let (213,53) be an AVWC. 

1. IfW is symmetrizable, then Gs(213,53) = 0. 

2. IfW is non-symmetrizable, then Gs (213,53) = G™®“(213,53). 
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We now start to take on a slightly different point of view, under which the AVWC becomes an 
object that has some parameters which can be subject to changes. When considering practical 
deployment aspects, such a point of view is necessary as all information we may have gathered 
about the channel during for example a training phase may not be accurate enough to model the 
real-world behaviour. Thus one needs to understand whether a slight error in the parameters 
may lead to catastrophic events, and this is the content of our next theorem. 

Theorem 3 (Stability of Cs). Let (2B, QJ) be an AVWC. If (22J, TJ) satisfies (73(211,23) > 0 
then there is an e > Q such that for all (2B',23') satisfying d((223,23), (213', 23')) ^ e we have 
(7s(2ir',23') > 0. 

However, despite the reassuring statement of Theorem[3l care has to be taken at some points, 
which are characterized below. 

Theorem 4 (Discontinuity properties of (7s). Let (213,23) be an AVWC. 

1. The function Cs is discontinuous at the point (213,23) if and only if the following hold: 
First, (7™®,^“(213,23) > 0 and second F(213) = 0 but for all e > 0 there is such that 
4(223,213,)’ < e and F{W,) > 0. 

2. If Cs is discontinuous in the point (213,23) then it is discontinuous for all 23 for which 
Cs--(213,TJ) > 0. 

Note that F(fiS) = 0 is equivalent to 213 being symmetrizable - a property which is defined in the 
introduction in equation The function F itself is the content of Definition\T^ 

The take-away from above Theorem is two-fold: First, it delivers a criterion for the Ending 
of a point of discontinuity that only requires the validation that (a continuous function) 

is nonzero in a specihc point and the running of a convex optimization problem (calculation of 
F in that point). Second, it becomes clear that any discontinuity of the capacity Cs arises solely 
from effects that stem from the “legal” link 223 - changing 23 has no effect on discontinuity. 

Corollary 2. For every 223, the function 23 (7s(213,23) is continuous. 

Note that discontinuity is caused both by the legal link 221 (see statement 1) and the link 23 
to Eve (statement 2), but depends on 23 only insofar as the capacity (7™®“(22J, Tl) has to stay 
above zero in order for a discontinuity to occur. 

Theorem 0] also delivers an efficient way for calculating whether Cs is discontinuous in a specihc 
point or not: One only needs to give a good-enough approximation of the continuous function 
then run a convex optimization in order to calculate T(22J). Regarding future re¬ 
search, it may therefore be of interest to quantify the degree of continuity of the capacity of 
arbitrarily varying channels in those regions where it is continuous. 

Remark 4. It is necessary to request the existence of the 22J, in the first statement of Theorem 
01 and an easy example why this is so is the following: 

Define e C{{\, 2}, {1, 2,3}) for i = 1,2 and e e [0,1/2] by 



19 


For every e e [ 0 , 1 / 2 ], these AVCs are symmetrizable with u(l|l) = e/(l — e) and u(l| 2 ) = 
(1 — 2 • e)/(l — e). The reason for this is that for every e e [0,1/2] the eonvex sets 
conv({VFi_e(<5i), VF 2 ,e(<^i)}) and conv({W"i^e( 52 ), W^ 2 ,e(<^ 2 )}) have non-empty intersections. It is 
also geometrieally clear that for any e e ( 0 , 1 / 2 ), there will be a small vicinity of AVCs which 
share this property. Thus, around such a all other AVCs are symmetrizable as well and for 
every 23 we therefore have both C's(213e,23) = 0 and (73(213', 23) = 0 whenever d(223e,213') is small 
enough. 

It is additionally elear from IWf that (7™/“(21Jo, T) > 0 and that it is therefore (since (7™/™ is 
eontinuous by the results of possible to choose 23 and <5 > 0 such that (7™/“(2IJo, 23) > 0, 
(7s(213o,23) = 0, and (7s(21J',23) = 0 whenever d( 21 J 5 , 2 IJ') is small enough. 

It is easy to see that the AVCWq does not share this property: Although (7™/“(213o, T) > 0 and 
(7™/“(213o, T) > 0, it is easy to find explieit examples of AVCs 213' whieh are arbitrarily elose to 
2 Uo but are non-symmetrizable. 

Of course, every whatsoever nice characterization of a set of interesting objects is pretty 
useless if the set turns out to be empty. Fortunately, it has been proven in [18] that the function 
mapping an AVC 211 to its capacity has discontinuity points by explicit example. 

Such an example is also given by (21Jo,‘X) with 213o taken from above. 

Remark 5. The capacity (7™/“(213,23) was quantified in f^ . It satisfies 


Cs7,"“(213,23) = hm Ckey(21J,23,G) = 0=^(213,23). (64) 

The proofs of Theorems [T] and Theorem [2] are carried out by providing coding strategies. The 
proof of the direct part of Theorem [2] extends the techniques from [22] by adding constraints on 
the random code that lead to it having additional security features. These features are quantified 
in the following Lemma: 

Lemma 2. For any r > 0 and /3 > 0, there exists a value iz{t) > 0 and an Nq{t) such that for 
all n ^ Nq{t), natural numbers K,L,T satisfying K ■ L ^ 2"^"^ and type p e Vq{A!) satisfying 
min 2 ,.p(a,)>oP( 3 ;) ^ there exist codewords in Tp a T’"', and a c' > 0 such that if 

T”^ > exp(— 2 "'''^^) and upon setting R = ^ log(iF • L) we have 






max|{(A;,Z) : I{-XkiY,s"‘) ^ t}\ ^ K ■ L ■ 2 


max 

7,s^ 


{k,l,j) : 


There is {k',l','j') A {k,l,'y) such that 


log L • r 

- ^ max I{p: V/) + r ^ max 

n qeV{S) ^ s^,k 


-\R- /(xfc;^;s")|+ > r 
L,r 

L-r 


yK-L-2 


-n-rll 


. i,r 


i,7=l 


(65) 

( 66 ) 

(67) 

( 68 ) 


where X'^ is distributed according to P(7f"' = x”) := the dependenee of v on t 

is such that limT^^o ^^(t") = 0. 


Our intention was be to apply this Lemma to AVWCs for which the link between Alice and 
Bob is not symmetrizable. While Lemma [2] contains the possibility to use shared randomness 
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r, this is not necessary in the application intended by us in this work (we use it only with T 
set to one). The main reason for keeping T as a variable in our proof this is that it allows us to 
deliver a unified treatment of the whole topic, increases the generality of the Lemma and does 
not require much additional work. 

Remark 6. The properties (fHHI) . fIM)) and fl67l) of the code are identical to those stated in 
Lemma 3], This Lemma again is the main ingredient to the proof of that non- 
symmetrizability (symmetrizability is defined in is sufficient for message transmission over 
AVCs if the average error criterion and non-randomized codes are used. Our strategy thus is 
to use the properties dMl), (IMD and dSH) in Lemma 0 in order to ensure successful message 
transmission over the legal link, if 2IJ is non-symmetrizable. 

The main tool used by Csiszar and Narayan for proving properties dMl), (IMl) and (| 671 ) of Lemma 
0 in their work was large deviation theory, and this is where we can make the connection to 
our work and prove the additional properties via application of the Chernoff-bound. 

Roughly speaking, this method of proof amounts to adding some additional requirements in a sit¬ 
uation where any exponential number of additional requirements can be satisfied simultaneously. 

When utilizing Lemma 0 (with L = 1) in the proof of Theorem 0 one sees that while reliable 
transmission is achieved via fulfillment of conditions (I65p , (1661) and (I67p in Lemma 0 if and 
only if the legal link 2 IJ is non-symmetrizable, the security of the communication can always be 
achieved by making L large enough. This implies that there are generic communication systems 
(AVWCs with a symmetrizable legal link) for which it is much easier to design codes that convey 
little information to Eve than codes which ensure robust communication. 

In order to derive from Lemma 0 the connection between symmetrizability and the capac¬ 
ity Cs (which is the content of Theorem [2]) it is necessary to prove not only achievability of 
quantities like e.g. miug /(p; Wq) — max^ /(p; W) but also of quantities like miug /(p'; oU) — 
max^n L{p'] o U) that involve multiple channel uses and pre-coding that is defined via the 
optimization problem Q. Such a process of adding pre-coding may unfortunately cause the 
AVWC arising from the concatenation of pre-coding and the original AVWC to be symmetriz¬ 
able. This highly interesting interplay of pre-coding and symmetrizability is quantified in the 
next Lemma and the following example. 

Lemma 3. Let 2B be an arbitrarily varying channel with input alphabet A, output alphabet B and 
state set S. Let T e C{A',A) be a channel. Let 2B' be the arbitrarily varying channel with input 
alphabet A!, output alphabet B and state set TZ defined by w'{h\a', s) := w{b\a, s)t(o|a') (or, 
equivalently, via setting W' := Ws o T for all s e S). 

If 2B is symmetrizable then W is symmetrizable as well. 

That, even for channels T whose associated matrix {t{a\a’)a'eA',aeA has full range, the reverse 
implication “2B' is symmetrizable ^ 2IJ is symmetrizable” does not hold came as a surprise and 
is proven here by explicit example: 

Example 1. Define an AVOW a C{{xi,X 2 }, {1,2,3}) by setting 


t(;(-|si,xi) 

= 5i, 

(69) 

u;(-|s2,xi) 

= <52, 

(70) 

u;(-|si,X2) 

= 0.65i + 0.262 + 0.2<53, 

(71) 

W{-\S2,X2) 

= 0.1(5i -t 0.3(52 + 0.6(53, 

(72) 
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where di{j) = I if and only if i = j holds for i,j e [3]. Then W is non-symmetrizahle: The 
equation 


Xw{-\si,xi) + (1 - A)t(;(-|s 2 ,xi) = iJ,w{-\si,X 2 ) + (1 - ia)w{-\s 2 ,X 2 ) (73) 

cannot have a solution with X, fi e [0,1] because 83 appears only on the right hand side and with 
strictly positive weights. 

However, if we add pre-coding by a binary-symmetric channel Np with parameter p e [0,1] 


we obtain the new AVCW defined via Wj. := Ws o Np or, more concretely, by 

xi) = p6i + p {0.261 + 0.652 + O.253) (74) 

w{-\s2,xi) = p62 + p{ 0 .l 5 i + 0.352 + 0.653) (75) 

w'{-\si,X2) = p 61 + p(0.65i + 0.252 + 0.253) (76) 

w\-\s2,X2) = P62 + p(0.15i + 0.352 + 0.653) (77) 

where p' := 1 — p. We set p = 0.4. The equation 

At(;'(-|si,xi) + (1 - A)it;'(-|s 2 ,a^i) = {■\si,X 2 ) + (1 - p)w {■\s 2 ,X 2 ) (78) 

can he written out explicitly into three equations for the two parameters fi,X. The solution is 
given by 

X = 31/37, n = 75/148. (79) 


This shows that W' is symmetrizable. The situation is depicted as follows: 


5i = •u;(-|si,xi) 


/ 1 w^(-|si,a;2) 


* w{-\s 2 ,X 2 ) 


62 = w(-ls2,Xi) 


^3 


61 


r w'(-lsi,X2) 

/ 

/ 

/ 

/ 

/ 

f/V(-|si,xi) 


W'(-IS 2 ,X 2 )' , 

d • w'{-IS2,Xl) 


52 


53 


Figure 3: Light gray lines are the vertices of the probability simplex P({1,2,3}). The sets 
conv({tt;(-|si,Xj),tc(-|s 2 ,Xj)}) where i = 1,2 are displayed as dashed lines. The intersection of 
the dashed lines on the right shows that 211' is symmetrizable. 


In order to derive the statement of Theorem [2] from Lemma [2] we can therefore not use a 
simple blocking strategy. Rather, we will present two methods of proof. The first employs a 
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reasoning along the lines of equations (HH]l until (f57|l . This approach is based on the concept 
of using a few non-secret bits in order to guarantee secrecy for the actual data. While this is 
highly interesting from a practical point of view, it does not utilize the full strength of Lemma 
[2j This proof uses a set of public messages that can be read by Eve but not by James, secrecy 
is only obtained for the (exponentially larger) set of private messages. 

Our second proof of Theorem [2] is based on lifting the optimal pre-codings for n channel uses to 
n -|-1 channel uses by using no pre-coding on the (n -I- l)th channel use. This type of pre-coding 
preserves non-symmetrizability. The second proof makes almost full use of the statements of 
Lemma O as we still set T = 1. No public messages are used in the construction of the code. 

It remains an interesting open question whether, for n channel uses, the optimal channel U 
arising from the n-th term of the optimization problem Q does in fact symmetrize {Ws^)s'^eS^ 
or not. 

Our next result is potentially the most interesting in this work, since it sheds additional light 
on a rather new phenomenon: the super-activation of “the” secrecy capacity of AVWCs. 

Theorem 5 (Characterization of super-activation of Cs via properties of Let 

( 2 ir*,QJ 0 *=i ,2 6 e TLWCs. 

1 . //Cs( 2 Bi,QJi) = Cs{W 2 ,^ 2 ) = 0 , then the estimate 

Cs( 2 rri 0 2 H 2 ,TJi 0 232) > 0 (80) 

is true if and only ifWi 0 2112 is not symmetrizable and 0 2112 , 2 Ji 0 232 ) > 0 . 

If (2IJi, 23i)j=i^2 can be super-aetivated it holds 

Cs{Wi 0 2 IJ 2 ,23i 0 232) = C’^ran(2iJi ® 2112,23i 0 232). (81) 

2. There exist AVWCs which exhibit the above behaviour. 

3. If shows super-activation for (213i,23i) and ( 2132 , 232 ), then Cs shows super¬ 

activation for (223i,23i) and ( 2132 , 232 ) if and only if at least one 0 / 223i or W 2 is non- 
symmetrizable. 

4- If shows no super-activation for (213i,23i) and ( 2132 , 232 ) then super-activation 

of Cs can only happen if 213i is non-symmetrizable and 2132 is symmetrizable and 
( 22 I 1 ,23i) = 0 and C™®™( 2 Il 2 , 232 ) > 0. The statement is independent of the specific 
labelling. 

Remark 7. Of course for 2ITi 0 2 II 2 to be non-symmetrizable, it has to be that at least one out 
ofWi, 2112 is non-symmetrizable. 

While Theorem 0 offers a complete characterization, it does not give any explicit examples - 
fortunately this has already been done in m, where two AVWCs were used as follows: The first 
legal link is modeled by an AVCTQi = iWi^i,Wi^ 2 ) with input system for Alice being {1,2} and 
output at Bob’s site being {1,2,3}. 

ir., - ( { 

(note that assume that the columns of a matrix representing a channel sum up to one, not the 
rows!) and the first link to the eavesdropper by 23i = (Vi) (no influence from the jammer on 


The transition probabilities were given by 
T / ^ \ T 

, Wi,2 = 


0 0 
0 1 


( 82 ) 


23 


that link). For the purpose of this example, it would even he sufficient to let 23i = T. This 
channel has the property that SUi is symmetrizahle. The second link was chosen to consist of 
two binary symmetric channels W 2 ,V 2 where W 2 was a degraded version of V 2 , but both had 
nonzero capacity. Thus, Cs{W 2 ,^ 2 ) = 0 but nontheless it is possible to transmit (non-secret) 
messages via W 2 . This example fits into the third class of pairs of AVWCs described in the 
above Theorem\^ 

While this explicit example is very interesting, our analysis provides a more systematic analysis. 
Note that all our arguments only apply to the strong secrecy criterion. The weak secrecy criterion 
can be handled differently, and will he the scope of future work. 

As a last point in this section, we would like to discuss connections between Cpp and Cs- At 
first, let us observe a similarity: The former shows super-activation if and only if the latter shows 
super-activation. To see this, we argue as follows: By definition, the class of codes which transmit 
public and private messages as defined in Definition [5] includes that according to Definition [7] 
where no public information is transmitted. Therefore it holds that C'pp(2n, QJ) ^ (73(211,23) for 
all AVWCs (221,23). Further, the definition of private/public codes according to Definition [5] is 
more narrow than the one of a common randomness assisted code according to Definition [3l so 
that every private/public code is at the same time also a common randomness assisted code. 
Especially, the public messages may be treated as if they were common randomness if L = F. 
Therefore, (7pp(223,23) > 0 implies that C™®“(223,23) > 0 for all (223,23). We conclude from 
Theorem [2] that (7pp(223,23) > 0 implies (7s(ikl, 23) > 0 for all (223,23). This leads us to conclude 
that 


V (223,23) : (7pp(221,23) >0 (7s(223,23) > 0. (83) 

Let now (7pp show super-actication on ((223i, 23i), (2232,232)). Then it follows from the statement 
in equation (l83l) that both (7s(223i, 23i) = (7s(2232,232) = 0 and (7s(223,23) > 0. Therefore, super¬ 
activation of Cpp implies super-activation of Cs. 

In the reverse direction, let Cs show super-activation on the pair ((223i, 23i), (2232,232)). From 
the statement in equation ([3]) we immediately see that Cpp shows super-activation as well. 
Concerning differences, we note that a question we have to leave open is whether there could 
exist AVWCs 223,23 such that (7pp(223,23) > holds. 

This question is of huge practical importance, as it allows the quantification of the interplay 
between private and public communication in interfering networks when i.i.d. assumptions are 
not met, as is often the case. 

4 Proofs 

4.1 Technical definitions and facts 

An important part of our results builds on the mathematical structure that was developed 
in [22]. The structure of the codes developed there builds on randomly sampling codewords 
which are all taken from one and the same set Tp. In our previous paper we used an approach 
that was built on sampling codewords according to some pruned distribution p' defined by 
p'{x^) := t,^ri}rpn ) • p®'^{x'^) for some p e V{X). The small deviation of p' from p®” brings 

with it some benefits concerning asymptotic estimates. Since this work uses the outcomes of the 
earlier work |38|, it would be desirable to use exactly the same technical approach. 
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However, due to the intended connection to [22], we cannot use p' in this work. Instead, we 
decided to use the same distribution as the one which was used in |22] which is in some sense 
further away fromp®”. While this ensures seamless connectivity to |22] . it also made us deviate 
(compared to for example our previous work [38]) from standard formulations in some other 
points, namely: We use a different notion of conditional typicality than before, and we dehne 
typical sets using the relative entropy rather than the one-norm. 

This deviation is motivated by the fact that, for any finite alphabet A and p e A as well as type 
N e Vq(A), we have p®”'(T^) = poly(n)2“’^'^(^ll'^) for some polynomial poly in n. Therefore, 
defining typicality with respect to relative entropy gives the best control on the asymptotic 
behaviour of typical sets. All methods that use other distance measures for the dehnition of 
typicality need to relate these other measures to the relative entropy. 

That the use of relative entropy is also elegant as compared to other methods can be seen as 
follows: Looking at m Definition 2.9] (which deals with typicality in the presence of channels 
and inputs to those channels) one sees an additional advantage of using relative entropy over 
using one-norm: dehning typicality with respect to variational distance requires one to add 
additional assumptions which are not necessary when relative entropy is used, as the latter 
quantity can become infinite. 

More precisely, let us assume we are given a channel W e C{A, B) such and (a”', 6”) e A!^ x 
such that for one specihc choice of a, 6 we have ^"(0, 6|o"', 6") > 0 but w{b\a) = 0. Then 6” is 
not a typical output of the channel given that its input was oT', since the probability that 
it is received when aP' as sent is zero: 

0 ^ u;®"(6"|o”) (84) 

n 

= Y[wibi\ai) (85) 

^ Yl w{bi\ai) ( 86 ) 

i:ai=a,bi=b 

= (87) 

_ QN{a,b\a",bA (gg) 

= 0. (89) 

Excluding non-typical sequences is crucial for the derivation of lower bounds on cardinality of 
the conditionally typical set, for example. Thus, the above sequence ft"" is excluded from the 
tc-typical set given a” explicitly in [201 Definition 2.9]. 

A notion using relative entropy captures this perfectly as well, but without necessitating the 
explicit exclusion: Let us assume that b^ is said to be w- typical given a” iff D(a”,6”') : = 
D{N{-\a^,b'^)\\PAB) satishes D(a"',6"') ^ 6 for some <5 > 0, where pAsia, b) := N{a)w{b\a). Then 
let a” be given and 6" be such that there exists a, b such that N{a, b\aP, 6") > 0 but w{b\a) = 0. 
It follows pAB{a,b) = 0 and therefore D{N{-\a"‘,b^)\\PAB) = 00, so that 0(x"',y”') = 00 and 
hence 6"' is not rc-typical given a”. 

A brief look at robust typicality as dehned in [35] shows that this quantity is also only related 
to relative entropy via inequalities. 

Therefore, our definition achieves two goals: It connects in the most direct way to the relevant 
probability estimates and can be written down with minimal effort. 

Thus, the sets which we will be using frequently in the following are, for arbitrary finite sets 
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AyB,C, every p e 'P{A), V e C{A x B,C) and (5 > 0 defined as follows: for a given {a^,b"') ^ 
A^ X B^ we define pabc £ V{A x B x C) via pabc{o--, b, c) := N{a^, b'^)v{c\a, b) and 


T^y.= {a^eA^:DiNi-\an\\p)^6}, (90) 

Ty ,{a^,bn ■= {c” : D{N{-\a^,b",c^)\\PABc) ^ < 5 }- ( 91 ) 

These definitions are only valid for 5 > 0. Each Tv^ 5 {s^,x'^) obeys the estimate 

h®"(r^ ,5(a”)|a") ^ 1 - 2-"'^/^ (92) 

for all n e N snch that \Ax B\^ log(2n) ^ (5. We set, for every p 6 'P{X), 

E{p) := max I{p-,Vg) and B{p) := min I{p;Wq). (93) 

qeV{S) qeV{S) 


For the technical part of our proofs, the most important tool will be the Chernoff-Hoeffding 
bound: 

Lemma 4. Let b he a positive number. Let Zi,...,Zl he i.i.d. random variables with values in 
[0, b] and expectation EZ; = u, and let 0 < e < ^. Then 

[(l±e)z/]| ^ 2exp , (94) 

where [(1 + e)n] denotes the interval [(1 — £)iy, (1 + £)v\. 

The proof can be found in |25( Theorem 1.1] and in [6]. 


4.2 Proof of the converse part of Theorem [T] (coding theorem for Ckey) 

Main ingredients to this proof are Fano’s inequality, data processing and almost-convexity of 
the entropy. 


Proof of converse for secret common randomness assisted secrecy capacity. Let a 

sequence K, = (/Cn)“=i of common randomness-assisted codes be given such that for all n e N 
we have 


^n.Kn 


mm — 
Fr 


Kr 


2 e"<{x^\k)wADl\x^)>l-er. 

7,/c=l 


max I{An;3s^) ^ En, 

grig^n 


(95) 

(96) 


and of course limsup„_>a 3 = 0. Set R := liminf^^oo ^ logif„, and G := hm„^oo ^ logF^. In 
addition to the random variable defined in Definition [3l consider b„) distributed as 




(97) 


gng^n 


Then for all n e N, g 6 V{S) and s” e Fano’s inequality implies 


(1 ~ Cn) log Kn ^ I {An', Aqj^\bn) — I{An',‘is‘^) + ^ + C, 


(98) 
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We can apply the data processing inequality to get 

(1 ~ Cn) log Kn ^ /(^; 2)q |bn) — I 3s") + 1 + ^n, (99) 

and from e.g. Lemma 3.4 in [20] and independence of the random variables ^ and h follows 
that the asymptotic scaling of the rate lim inf^^oo ^ log Kn can be upper bounded through the 
following inequality: 


(1 - en) log Kn ^ Vg) ” 3s") + H{bn) + 1+6, 

Since this estimate is valid for all q e V{S) and s” 6 S'^ we get 


Define the distribution p e V{{Kn\) and the channel U e by 

p{k) = U{x^\k) := 2 U\x^\k) {k e [Knl x^ e X^). 

Kn ^ I 


7=1 


Then we arrive at 


( 100 ) 


logiL„ ^ — ( min 1(1^; T)”) - max /(j?,,;3s")^ + . (101) 

l-en\qeP{S) s^eS^ 'J I - En I - en 


( 102 ) 


log Kn ^ ( min /(p; oU)- max I(p; W" « U)] + (103) 

l-en \qeV{S) '' J 1 - e„ 1 - 6^ 


Of course, we can obtain a more relaxed upper bound by optimizing over all p e VdKn]) 
and U e C{\^Kn\, X'^). We then obtain (since Kn ^ for every reliably working code 

and, therefore, V{[Kn\) <= 'P([|df”|]) under the standard embedding [iL„] c [|‘T|”']) by further 
increasing the size of the input alphabet from Kn to \X\^ with lAn '■= that 

R ^ lim — max max f min I{p', o U) — max /(p; W" o 17) ) + G. (104) 
n^oo n peUn UeC{Un,X") \geP(<S") ^ s~g5" J 

As it has been proven in [38| that the capacity equals the leftmost part in the above sum 

we have proven the desired result. 

Another obvious bound on the capacity arises by ignoring all security issues: since JC ensures an 
asymptotically perfect transmission, we have 

lim — logiLn^ max min I(p-,W„). (105) 

n^cc n peV{X) qeV(S) 


This establishes the converse part of the coding theorem. 


□ 


4.3 Proof of the direct part of Theorem [T] (coding theorem for Ckey) 

Let G > 0 be given. Define p := argmaxpgp(;f)(il(p) — E{p)). Set G' := max{Ll(p),G}. 
Intuitively speaking, this is the amount of common randomness which can be put to use in the 
obfuscation of Eve. Choose a r > 0 such that z^(r) from Lemma [T] satisfies i^(r) < G'. Let 
n e N be so that for all n ^ there is pn e Roi^) such that |il(pn) — B{p)\ ^ max{r, i^(r)} 
and |E(pn) — E(p)| ^ max{r, J6 (t)}. This can be achieved by approximating p through types pn 
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via Lemma [8] and since both B and E are continuous functions. Take three sequences 
(Ln)n=i) (rn)n=i natural numbers. Without loss of generality, we can ensure that 
satisfies both r„ ^ for all n e N and lim^^oo ^logr„, = G'. Let now n 6 N satisfying 


n ^ N he fixed but large enough such that in addition 

E{p) — G' + 4t ^ — log{Ln) > E{p) — G' + 2r, (106) 

n 

B{p) - E{p) +G' - 4(t + ^ - log(iLn) ^ B{p) - E{p) + G' - 2(t + ^{t)) (107) 

n 

be satisfied, for all large enough n e N. This implies both 

- log(iLn • Ln) ^ B{p) - E{p) +G' - 4(t + v{t)) + E{p) - G' + At (108) 

n 

= B{p) — Av{t) (109) 

^ B{pn) - v{t) (110) 

and 

- log(L„ • r„) ^ Eip) + 2r ^ E{pn) + r. (Ill) 

n 

Asymptotically, we also have this yields 

liminf — \og{Kn) ^ B{p) — E{p) + G — A ■ {t + z^(r)). (112) 

n^co u 


At the same time, the prerequisites of Lemma [T] are met such that a reliable sequence of codes 
exists which is also secure with respect to || • ||i: For all large enough n 6 N we have 


min 

gn 



1 


K-L 


Ws^{D 


kll^kl'y) 


> 


I _ 2-^'^G) 


max 

,k 


1 

L-r 


L,r 

^ Vsn{-\Xkl^) -Evsni-lX"^) 


^2 


'(t)_ 


(113) 

(114) 


It can already be seen that this yields reliable communication at any rate which is strictly below 
B{p) — E{p) + G - we proved the achievability of rates close enough to B{p) — E{p) + G, but it is 
clear that time sharing between a trivial strategy where only one codeword is being transmitted 
(which is then automatically perfectly secure) and the strategy which was proven to work in 
the above will show achievability of all other rates R e [0, \B{p) — E{p) + G|^]. That we also 
get secure communication can be seen as follows: From [201 Lemma 2.7] we know that our 
exponential bound (1621) asymptotically leads to fulfillment of the strong secrecy criterion. 

We have thus proven that, for each t' > 0, the number 


max ( min I{p\Wa) — max I(p:Vo) I +G — t (115) 

PgV{X) \qeV{S) qeV{S) J 

is an achievable rate. We now proceed by adding channels U at the sender and using blocks 
of the original channels together: Since we now know that, for every r e N, G > 0 and 5 > 0, 
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pe V{Ur) where Ur '■= [|<^r] and U e C(Ur,‘^’’) there exist sequences /C = (fCm)m=i such that 
for every e (5^)"* = 5^'"* we have 


1 -I Ttti 

ZZZ e(x^-nk,7)^Vsrr^r(D^l^r.m^ ^ 1 - e 

J-m 


(116) 


where e [7™ are codewords (each is an element of Ur) for {Ws^ o U)sresr, and the 

stochastic encoder is e(x'’'™'|A:, 7 ) = Y\T=i'^^^ij\^k,'y,i) for x^'™ and it holds that 


liminf — logXm ^ niin Hp] oU) — max I{p\ Vgr oU) + r ■ G — 5. (117) 

m^co m qeViS^ s^gS^ 


We can define values tn ^ {0,...,r — 1} by requiring n = m ■ r + tn for them to hold for some 
suitably chosen m = m{n) e N. This quantity satisfies — 1 + n/r ^ m{n) ^ n/r. For every n e N 
we then define new decoding sets by 


■■= Dl X y- (118) 

and new codewords by setting for some arbitrary but fixed x*** 

Xk^ := {xk^,x*'^). (119) 


From the choice of codewords and the decoding rule it is clear that this code is asymptoti¬ 
cally reliable. The asymptotic number of codewords (mind that = Km(^n)) calculated and 
normalized with respect to n, is 


lim inf — log Kn = inf —- 

n^oo n n^oo m(n) ■ r + tr 


log Knn{n) 


^ lim inf — • 


1 


CO r m{n) + 1 


log Krn(n) 


= lim inf — • 


m{n) 

n^co r m{n) m{n) + 1 
log Krn{n) 


log Kni^n) 


= — lim inf 


r n^oo m{n) 

- ( min I(p; Wn oU) — max lip^Vgr o U) + r ■ G — 5 ] . 
r \qGV{S^) s^gS^ ' 


( 120 ) 

( 121 ) 

( 122 ) 

(123) 

(124) 


To see that every number C*{W, QJ) — e is an achievable rate, take r, U and p such that 


G*{W, 51) 



( min I(p:WooU) 
\qGV{sn 


max I(p- Vgr o U) 
s^gS'' 


(125) 


This is possible since in [38] it was (in addition to the equality •) = C*(-, •)) proven that 


^="(211,51) 


lim — max max ( min I{p\W„oU) 
r^oo r pGV{Un) UneC(U,X”-) \qGP{S7 


max I(p: Vgr o U) 
S’’s5’’ 


(126) 
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We set 6 = r ■ e/4. Then from our preceding arguments it becomes clear that there is a sequence 
fC of asymptotically reliable codes at an asymptotic rate 


liminf — log.^,i ^ ( min HP'i oJJ) — max I(p: oU) + r 

n->co n r \qe'P{S'^) ^ s^eS^ 

^ ( min I(p: WJP' oJJ) — max I(p: W oU) + r 

r \qeViS^) « s^eS^ 

> C*{W,^) + G-e/2-e/A 

> C*{W,^) + G-e. 

This proves the direct part of the coding theorem. 

4.4 An intermediate result 

We now have to prove the core results from which all the other statements can be deduced. 
The idea of proof will be to make a random selection of the codewords where k are the 
messages, I are non-secret messages which are only being sent in order to obfuscate the received 
signal at Eve, and 7 are the values of the common randomness. When applying the results 
to AVWCs, the decoder is the one defined in [ 22 ] whenever we study Cs and is defined here 
according to our needs for the study of Ckey 

We define events which describe certain desirable properties of our codewords, in 

dependence of (211,23) and the numbers K, L, T of available indices k, 1, 7 . We then use Chernoff 
bounds. This guarantees that the random selection of codewords has each single property we 
would like them to have with probability lower bounded by 1 — exp(— 2 "''^) for some positive 
constant c > 0 and all large enough n under some conditions on T, L and K which of course 
depend on (211,23) as well. Application of a union bound then reveals the existence of one 
particular choice of codewords that has all the desired properties simultaneously. 

Using exactly this method of proof, Csiszar and Narayan m Lemma 3] proved properties 
(|65p . (|66p and (|67p of Lemma[2j Thus what remains for us is to provide proof that the remaining 
event ([68P has high probability. 

In [22] . large deviation results for dependent random variables were employed, but the 
underlying probability employed in codeword selection was the same as the one used by us, so 
that our findings connect seamlessly. 

We become a bit more concrete now. Let p e 'Pq(A), q e Vq(S). Throughout, we will 
attempt to twist and tweak asymptotic quantities such that they are calculated with respect to 
the random variables (S,X,Z) dehned via F((S, X, Z) = (s,x,z)) := p(x)q(s)v(zlx, s). Since 
the distribution of {S,X,Z) is so important, we label it by psxz- The variable p will remain 
fixed, and q will always denote a type corresponding to one of the choices of James. 

The proof will require us to draw codewords at random. As stated already, we adapt this 
procedure to the one chosen in |22j. This is done as follows: We define the random vari¬ 
ables Xki-y (l^A:^iL, 1 ^ 7 ^r)by F{Xki'y = x”) := |^lTp(a^”') for all 

k 6 [iL], I e [L] 7 e [T] and x"" e A"', where K,L,T are natural numbers. We write for 
the realizations of the variable X^i^, instead of x'^i^. The random variable X := is 

distributed such that each is independent of if {k,l,^) ^ {k',l' The realizations 

of are written x. We use the projections ■Kki'y defined by 7 rfc/^(x) := Further 


r•e/dj 

(127) 

-6/4 

(128) 


(129) 


(130) 
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projections as e.g. := 7r^(x) := are defined wherever there is a need. 

In order to enhance readability, we will not only omit the superscript n in our codewords, 
but from time to time we will also write statements like Vs”, property P holds. Then, it is 
understood that P holds for all s” 6 5”. 

When calculating expectations of any of the Xki^ we need no reference to k,l,'y due to 
independence of our random variables. We therefore add another random variable, X”, 
distributed as P(X” = x”) = as well. 

A first and crucial step for all that is to come in the proofs of the technical Lemmas [T] and 
[2] is to fix some 5 > 0 and p e V{X) and define, for all s” e 5” and z” e Z'^, the functions 
: A” —> [0,6] (where b := fgj. gome function / 2 , as we will see soon) 

by ’ 


M(s”, z”) := {x” 6 Tp : D{N{-\s^, x”, z^)\\psxz) ^ -5} (131) 

0,n,,.(x”) := n®”(z”|s”,x”)lM(.»,.^). (132) 

In order to enhance readability, the dependence of both M and 0 on 5 is suppressed here and in 
the following. All our proofs rely on a common strategy, which only deviates in one point: The 
codes which ensure reliable transmission. For non-symmetrizable AVWCs we rely on the work 
m and use the codes which are defined therein. This will be sufficient to obtain all the results 
that we claimed for the uncorrelated coding secrecy capacity. 

The coding theorem for secret common randomness assisted secrecy capacity needs an additional 
definition of codes. This definition is as follows: 

For every n e N, set := Vq[S). For every x”, define (not necessarily disjoint) “decoding” 
sets by 




and for a collection x.y := {xki-y)j^l^i of codewords with fixed value of 7 set 

D{^'y)kl ■= n f U U ■ 


(133) 


(134) 


This defines the code /C^. This definition allows the decoder to decode the randomization index 
I as well, an approach which works for AVWCs and compound (wiretap) channels with convex 
state sets via the minimax theorem. Note that this code will only ensure reliable transmission 
if F is sufficiently large. 

In order to deliver a joint treatment of the subject it makes sense to define the following events, 
where we implicitly assume a functional dependence 5 = (5(r) that will be specified more exactly 
later during our proofs. The sets E^,... ,E^ depend only on r, whereas Ei depends also on 2B 
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and E 2 on 2 J. 


El := ] x|Vs^z^A; : ^ ^ ^ [(1 + 2—/4)E0,.,,n] 


/,7=1 


( I r 

£^2 := ^ x| min — V ^1 — 2-2 


—n5/4 


7=1 


^3 := ]x| max^ \{{k,l) : (x”,xm^,s”) e T>(.|^n ,n)}| ^ 2*^ 

7,x ,s 

:= |x| max|{(fc, Z) : /(xfci^;s”) > r}| ^ K ■ L- 2“"''^| 


^;4 : = 

E^ := ■( x| max 


7,s" 


. There is ¥= {k,l,'y) such that 

’ ' lAlY^^k'l'-y',S^) -\R-I{^klY,s'^)\+ > T 


(135) 

(136) 

(137) 

(138) 

^K- L- 


(139) 


The average success probability dsn{E<y) was defined in Definition [3l The events E 3 ,E 4 ,E 5 are 
proven to have high probability in | 22 j (actually, their proof is valid for |r| = 1 but can be 
extended to arbitrary |r| by simple union bounds, which leads to the following statement: 

Lemma 5 (Cf. [22]). There is c' > 0 such that, ifW is non-symmetrizable, we have that 

F{E 3 n ^4 n ^ 5 ) ^ 1 - T • exp(-2"'^') (140) 


The bound in Lemma [5] is trivial whenever T > exp(2"''^ ). In the applications intended here, 
the maximal scaling of T with n will be exponential, so that nontrivial bounds arise. 

Our main effort in the following will be to show that a similar bound is true for P(i7i) and 
P(ii' 2 ) under the right conditions on K, L and T. With respect to these conditions, any of the 
intersections Ei n ... n Ej will then have very high probability as well. 

For the proofs of both Lemma [T] and [ 2 | it will be of importance to control the amount of 
information which leaks out to Eve. This will require us to prove that a careful random choice 
of codewords will be provably secure, and this is the main content of the following Lemma 
(which contains statements concerning the message transmission capabilities of the common 
randomness assisted codes defined in (|133l) and (|134l) as well). 

Lemma 6. Let K,L,T e N. Let the random variable X be as described above. Then for every 
r > 0 and /3 > 0 there is a 6 > 0 and and iV e N such that for all n ^ N and types p e Vq{X), 
the following statements are true: 

1- If ^log(7^ • r) ^ ^ip) + T and mina,,p(,j,)>oP(a:) ^ jl, then 

P(Ei) ^ 1 - 2 • |5 X T X .21’" • exp(-2’"'^/®). 

S. If I \og{K ■ L) ^ B[p) -5-2- fiiVAd) then P(E 2 ) ^ 1 - exp(n • log(|5|) - T • 2-”^). 

3. For every P > 0, \JY\, |<S| and \Z\, a functional dependence between 5 and r can be chosen 
such that IhuT-^o = 0. 

The number N depends on \X\, |5|, \Z\ as well as onp (via the quantity P := ra.\ii,,,^X:p{x)>oP[a:)) 
and on 5. 
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Proof. Some of the statements we wish to prove here are not about the full random variable 
X = but only about exponentially many parts of it. We do therefore feel the need 

to write a few lines concerning our strategy of proof. We adopt the usual point of view that X 
somehow generates matrices of codewords. In the special case treated here it will be convenient 
to think of realizations of X as a list of F matrices, all of which describe a code-book and each of 
these code-books uses the index I solely for making Eve obfuscated, while k is used to transmit 
messages. The fact that 7 is known to both the sender and the receiver lets the receiver adapt 
his decoder appropriately, while Eve only sees the average over all code-books. The effective 
randomness used for obfuscation of Eve is therefore L -T. 

Before making this more precise, we need additional notation: 

As stated already, the projections ^ A"' project onto the copy of corre¬ 
sponding to such that TTki^fX.) = X^i^. Accordingly, vr^ are the projections mapping X 

to Xfc := 

The trick will be to first understand how to embed statements concerning only certain projec¬ 
tions of X into the whole random selection process. The idea is to proceed as follows: 

Take any set of functions gi, ■ ■ ■ ,gM ■ A’"’ ^ [0, 6 ']. Then for all k e [AT], 

. L,r L,r 

2 9m(7rki^(X)) f [(1 + e)E 5 ^]) = P(^ ^ f [(1 + e)E<7m]), (141) 

K L r 

where the left hand side is a probabilistic statement about X = ^ ^7^ and the right hand 

L r 

side is a statement about the random variables X^ = Thus by the usual Chernoff 

bound Lemma m we have 

/ ^ l,t 

P I 3 m, A: : ^ gm{T^i^{Xk)) f [(1 + efEg^] 

\ ^>7=1 

(142) 


^ 2 ■ M ■ K ■ exp ( — 


L • T • • min^ E^^ 


3-6' 


Another crucial connection in what is to follow is that for all and s” we have (using the 

abbreviation N{-) := A^(-|s"',x"",z”) and r( 2 ;|x,s) := N{s,x,z)/N{s,x\s'^,x'^)): 




= 2 


N{s,x,z){log r^^+log 17 ^)) 


p(x)q(s) ‘ 


^ 2n-i-D{Ni-\s'^,x^,z'^)\\pxsz)+DiNi-\s^,x^)\\p®q)-H{Z\S,X)) 


(143) 

(144) 

(145) 

(146) 


where SXZ is distributed according to N (note that without loss of generality we may assume 
that p,q > 0 here and in the following lines, since otherwise we could simply erase a symbol 
from the alphabet A or <S). 


Proof of property 1 of Lemma [6l Let n e N. Replace M with 5" x and the 
functions gm with the ©^n^^n’s. We let 5 > 0 be arbitrary for the moment. Using equation (11431) 
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and the fact that the relative entropy is never negative it can be seen that each &s^,z^ obeys 


0 . 




^ 2 -niH{Z\SX)-D{N{-\s^,x^)\\pS,g)) 


(147) 

(148) 


This bound does obviously still depend on x”. But if x” e M{s'^,z'^) then the distribution of 
SXZ has the following important feature: by Pinsker’s inequality, we have 

— Psxz\i ^ v^- (149) 


Setting f 2 { 6 ) := 2- + 6 , an application of Lemma [TT] from the appendix together with 

monotonicity of the relative entropy then yields 

Vx’^eT": 0,„,^u(x”) ^ (150) 

Here, /i is defined setting ^ = 5 x x This justifies our choice of b. Note that the 
definition of 0 together with the monotonicity of D{-\\-) ensures that the empirical distribution 
fV(-|x"', s") is almost product (iV(-,-lx”, s"") p{-) ■ lV(-|s"')) and that this property was vital 

in the derivation of the results contained in [22], whereas it may not be strictly necessary here 
(but does lead to a valid strategy of proof, nonetheless). 

In order to apply the Chernoff bound we also need to calculate the expectation of each 0s", 2 ", 
and for that matter it will be important to obtain a tight enough lower bound on \M, z"')\: 
According to Lemma [9| from the appendix (set A = X and B = S x Z there) we have 


We are now almost ready to give a lower bound on the expectation of 0s", 2 "- Be aware that 
s” of type q and z” remain fixed quantities for the moment. Prom monotonicity of the relative 
entropy and Pinsker’s inequality applied together with Lemma[TT]it follows that we can estimate 


It then follows that, if M(s'^,z'^) A 0, we have the estimate 
E0s",2" = ^ 2 V®^{z^\x^,s'^) 


' x^eM{s^,z^) 

^ 2 -n{H{Z\X,S)+ 25 +hiV^)) . 2”(^W-^c:(n)) | ^ 

Estimate (I149p together with the continuity of entropy yields (see |20l Lemma 2.7]) 


(152) 

(153) 

(154) 

(155) 


We define m : x Z"^ {0,1} by m{s^, z^) = 1 if M{s^, z^) A 0 and m{s"', z^) = 0 else. It 

then follows that for all large enough n 6 N 


E 0 s", 2 " ^ m(s",0) • 


(156) 
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where fsiS) := A{6 + /i(V^). For our random variable X this can be used as follows: via the 
Chernoff bound, 


L,r 


F{3k, s^, z^: Y, 0s^.n(7^fc/^(X))^[(l + e)E0,n,,n]) 


L-r 

/,7^1 

I _ ^ f millcn ]E0o^ z'^ 

^ 2 • |<S X d:” X • exp -e^ • L • r 


3-6 


2 min^n E0s" 2>i 

= c(nj • exp I —e ■ I ■ L ■ — 


3-6 


(157) 

(158) 

(159) 


on account of the same argument that we used in equations (11411) and (|142|) and with the 
obvious definition of c(n). Now we have to plug in the asymptotic behaviour of L • F, e and b. If 
m{s'^, z"') = 0 then the statement is trivial. We set f{5) := /2((5) + fsiS), E{p) := maxg /(p; Vq) 
and let ^ log L • F ^ E{p) + r for some r > 0. Note that, no matter what the distribution of S 
(which depends on the choice of James!), we have E{p) — I{X] Z\S) ^ 0. Therefore, 


E 05 n 

— • L • F- 

3 b 


- V , y 3 

= mis’" 

V , ; 3 

> ml's” z”!— • 

- V , y 3 

,2 

= m(s”,z”)--2”(”-^(^)). 
3 


(160) 

(161) 

(162) 

(163) 


Upon choosing e = 2“” “ we get a doubly exponential decay of the probability in equation (11571) 
if 0 > T — 2a — /(J), and since lim^^^o /(<5) = 0 there is a combination of J > 0, r > 0 such that 
for a = t/6 and all large enough n e N we have 


LT 


P 3 A:,s”,z” : 


L-T 


Y ^ [(1 ± 2-””/®)E0,n,,„] ^ c(n) • exp(-2”-/6). (164) 


/, 7=1 


It is clear that this defines a dependence 5 = (5(r) and that limT-^o<5(T) = 0 and 6{t) > 0 for all 
(small enough) r. A specific choice that we will use here is 5{t) = r. 


Proof of statement 2 of Lemma [6t We will need Ahlswede’s robustification technique. 
Lemma 7 (Oil]). If a function f : S"^ ^ [0,1] satisfies 

X! /('S”)9(si) • • • ^(Sn) ^ 1 - e (165) 

s"gS’^ 

for all q e Vq{S) and some e e [0,1], then 

A XI /(’^(s’^)) ^ 1 - 3 • (n + l)!*^! • e. (166) 
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We will in the following make use of the codes JC^ which defined the set £’ 2 - 
We would like to use the Chernoff bound for the variable F, so we have to control the expectation 
for each hxed 7 . Note that the construction of codes is such that it is independent from 7 , so 
this will not turn into a hopeless case if we draw an independent number T of realizations of 
above codes. We go as follows: First associate to any given choice of codewords 

the corresponding code /C(x.y) as defined in equations (11331) and (I134p . Then, for every and 
ye [F], define the success probability of that code via 


K,L 


kl=l 


ds^{^'y) ■ ^ j W gn (^D (x.y') kll^kl'y') ■ 

We then have for each fixed 7 


1 


K,L 




1 


k,l=l 

K,L 




k,l=l 


^ 1j - K ■ L ■ ^ 

x'^,x"eTp ' P' 


_ T , 

-sTj, FPI 


k'^kU^l 
1 
rt 


(167) 


(168) 

(169) 

(170) 


Now observe that 7r(Tp) = Tp for every tt e and that, for all vr e Sn, x"’,?/” and s"' we have 
^'^s"(7r(?/"')|7r(x"')) = In addition to that, D-jr^x^) = 7r{Dx^), so that we can write 


Edgnp^^) ^ ^ Wsn{D^^^r.)\7r{x‘^)) - K ■ L ■ ^ -^-^Wgr^iDx^lx"')- (171) 

TTGSn X'^,X^GTp ^ 

By Lemma 2.3 and equation (2.1) in [20], the density j^lTp satisfies 

-^iTp ^ (n + i)l'^l2—(172) 
Npl 

= {n + l)\^\p®^lTp (173) 

^ (n +l)l'^lp®^. (174) 

Setting pl(n) := (n + l)l‘^l, we use this to further develop our bound as follows: 

Ed,.(X.,) ^ ^ 2 Wsr.{b^^,^)\TT{x^)) -K-L- pl(n) • ^ ^^wf^{DxAs") (175) 

neSn x^eTp ' P' 

= n 2 ^^"(^ 7 r(x")k(x’"))-iF-L-pl(n) • 2 (176) 

TTSSn TTSSn 


where x"’ e Tp is arbitrary and Wp{y\s) = ^xeX according to our definition in 
equation m- By carrying out the same estimate as in equation (I172p for the distribution 
induced by the type q of and setting pl 2 (n) := (n + l) 2 -max{|A’|,| 5 |} gg^. j^gj.g 
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that Wpgiq{y) := Xjs x ' Pi^) ' 'w{y\s,x) defines, according to our convention, a probability 
distribution on V{y) which is identical to W{p(S)q)) 

E4.(X^) -K-L- pl^in) • w%{D^n) (177) 

neSn 

= —^ Y ^7r(s")(^a;"|a:^”) -K-L- pl2(n) • w'^q{Dj;n) (178) 

neSn 

^ A 2 ^ 7 r(s")(^a;"|a;’') - K-L- pl 2 (n) • max'u;^g(TM/^, 5 (x'^)). (179) 

77-. „ 

TTGOn 


It is now the time to apply Ahlswede’s robustification technique. For the fixed but arbitrary 
x"" 6 Tp define / by fixing all its values /(s"") via /(s"") := Ws-'^{L)xA^'^)- Then by Lemma[7]we 
get 

E(isn(X.y) ^ 1 — (n + l)!*^! maxu;®"'(.D^Ti|x"') — K - L - pl 2 (n) • maxtc^q(riy^^, 5 (x"')) (180) 


> 


1 - pl2(n) ( maxlFf’^(Tu^, 5(x’^)'^|x”) + 


CgS. 


+K - L - maxw'^q{Tw^,s{x"'))) 


^ 1 - pl 2 (n) ( 2 ^K-L- max 

\ xeX 


(181) 

(182) 

(183) 


The last term in above estimate deserves special attention. Following the lines of proof of Lemma 
3 in [9] (which was originally proven in [32]) we see that 


DiNi-\y^)\\W^ip)) = D{Ypix)NA-\ynmip)) 

X 

^Ypi^)D{Nx{-\x^,yn\\Wdp)) 

X 

= D{N{-\x^,y^)\\W^{p)®p) 

^ 5. 

It follows that for each ^ 6 we have by Lemma [11] that 

irg,(rB-„iU'”)) c |rWi,s{a:")l max 


(184) 

(185) 

(186) 

(187) 

(188) 


^ |Tvk, 5 (x")| max (189) 






u- ■ (190) 

We further estimate that for the distribution pxY,^ ^ V{X x T) defined via pxY{x,y) '■ = 
p{x)w^{y\x) we have 


\Tw,A^n\^ _ max \{r--N{-\r,xA = N{-\y^,xA}\ 

y'^:D{N{-\x'^,y^)\\pxY,i)^S 

^ _ max 

y":D(N{-\x",y”)\\pxYx)^^ 

< 2"-2.pW^^(W"e('5m))+/i(vT7) 


(191) 

(192) 

(193) 
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by Lemma [9] and Lemma [HI We can now re-insert this estimate into our original problem and 
obtain 


E4n(X^) ^ 1 - pl 2 (n) ( 2 -^-^ + K -L- 
^ 1 - pl 2 (n) + K-L- 

^ 1 - pl 2 (n) (^ 2-"''^/2 ^ 2-"-'5/2^ (196) 

^ 1 _ 2-’^"5/4 (';L97) 

for all large enough n e N, since 

K ■ L ^ ^ 2”’(™“?'^(P’^?)“'^“2'/i(v^)) (198) 


by assumption and since ’^(*5). Observe that this lower bound is entirely independent 

from the choice of s"’ 6 5”. It now follows from the Chernoff bound Lemma 0] that 

1 r 

P(Vs”: - 2 ^ (l-e)]Ed."(/C)) ^ |5r-exp(-r-e2-Ed,n(/C)/3) (199) 

^7=1 

^ exp(n • log(|5|) - r • • (1 - 2"”''^/^)/3). ( 200 ) 

Choose e = 2 “’^''^/^ to obtain the statement. 

Proof of statement 3 in Lemma [6} The proof of this statement follows from the 
proof of statement 1 where the functional dependence t >—>■ 6{t) is specified. □ 

4.5 Proof of Lemma [1] 

Proof of LemmaUl We know from Lemma [ 6 ] that (if ^ log(iL ■ L) ^ B{p) — 6 — 2 ■ /i(V2 • (5) for 
some (5 > 0 and n is large enough) 

P(E 2 ) ^ 1 - exp(n • log(|5|) - L • • (i - 2 —^/ 2 )). ( 201 ) 

Stepping away from the goal of proving Lemma [T] we see that there are two possible routes 
which diverge from here. One is to make L as small as possible, the other will be to exploit 
large numbers L. We will soon go on with the second approach and thereby prove Lemma (H but 
first let us assume that we want L to be as small as possible (in an asymptotic sense of course). 
How can we achieve this? We take any sequence (e„)„gf^ of numbers e„ e [0,1] which converges 
to zero. Depending on such a choice, we set r„ = 3 • log(|iSp)^(l — 2“"'^). It follows for the 
average success probability dgn^fC^) as defined in Definition [3] that 

1 r 

P( Vs*" : - 2 dsn{}C^) ^ (1 - e)Ed,n(/C) ) < 1, ( 202 ) 

^ 7=1 

proving the existence of a sequence of codes for which 

2 Wsr^{Dk^\xk^) ^ 1 - en (203) 
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(whenever r„ scales asymptotically as si If = n for some small number > 0 for 

^71 

example we get F^ si ^ This type of asymptotic scaling of common randomness 

has been observed several times now in the literature, and obviously raises the question whether 
F„ = const • n would be sufficient to guarantee asymptotically optimal performance, for some 
sufficiently large number const depending only on |<S|, for example. 

We can now proceed our proof of Lemma [1] by using equation (120011 together with Lemma [6] and 
a union bound: Let /3 > 0 and r > 0. From now on until the end of this proof, let 5 = S{t). Let 

-log{rr,-Ln)^E{p) + T, B{p)-d-2-fi{V2^)^-log{K^-Ln). (204) 

re re 

It then follows that for all large enough re it holds that 

P(.Ei n E 2 ) > 0. (205) 

Thus, there is a realization x of X such that for this particular realization we have 

. L,r 

Vs-, Z^,k: Qs^,zP^ki-y) e [(1 + 2—/4)E0,.,,»] (206) 

/,7=1 

1 r 

min - V dsr^iK,^) ^ 1 - 2 • 2"-^/2 (207) 

7=1 

Further, for every k e \Kn\ we have (setting A(s-, 2 ;-,x-) := 0s'i,2"(a^-) for all and x-) 


2 Vgr^i-pklP 


L,r 

l.Y Ti Ps^i-pkiP A(s-, 

*; ^kl'y) 

+ 

. L,r 

2 A(s-,-,Xfc;7-EA(s-,-,A-) 

Z,7^1 


1 

/,7=1 


+ ||E(u«n(.|X-) -EA(s-,-,X-)||i 


^ S •> + 2--/" + E ||u,n(.|A-) - A(s-,X-)||i 

/,7=1 


(208) 

(209) 

( 210 ) 
( 211 ) 


where the first inequality is due to the triangle inequality of || • || 1 and the second one due to the 
specific probabilistic choice of x, especially the validity of (|206l) . We now use the definition of 
0^n in order to derive bounds on the remaining quantities: for every x” e Tp we have 

||x,n(.|x-) - A(s-,-,x-)||i = 2 x®-(z-|s-,x-) (212) 

z'^:D{N{-\s'^,x'^ ,z'^\\psx 

= x®-(ry,5(s-,x-)'^|s-,x-) (213) 

^ (214) 

for all large enough re. Thus 

. L,r 

2 ll^^."(•|xfc^7)-A(s-,•,Xfc^7||l + E||x,n(.|A-)-A(s-,•,A-)||l ^2-2—5/2 (215) 

Z,7=l 
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for all large enough n e N so that we ultimately get (uniformly in /c e [i^]) the bound 

. L,r 

YTp 2 lks"(-|xfcz^)-A(s-,.,Xfc;^)|[i^2.2-'5/2 + 2-"/^^2—W, (216) 

i,7=l 

for all large enough n and setting z/(r) := min{(5(r), t}/ 5 (note that i'{t) = r/5 is a valid 
choice). □ 

4.6 Proof of Theorems [21 SI and [H (properties of Cs) 

Proof of Theorem [11 We give the proof of the properties of Cs in the same order as they were 
stated in the theorem: 

1. This is clear from m where it was proven that symmetrizability makes it impossible 
to reach reliable transmission of messages. 

2. The strategy of proof is to use Lemma [2] with T = 1. The reason for this is that, 
by assumption, 2B is non-symmetrizable. Now, we know from Example [T] that this does not 
imply that every W o U is non-symmetrizable as well. More precisely, to a given r e N there 
may exist an alphabet Un, ape ViUn) and a channel Un e CiU, T’”') such that 

min I(p-,Wp o Ur) — m.ax I(p;Vsr o Ur) (217) 

qeV{S^) s^eS^ 

= max max min I(p'■ Wp o U') — max I(p': Vgr o U') (218) 

p'eV{Ur) U^eC{Ur,X^ qeV{S^) s^gS^ 

^Cs™(2B,QJ)-e (219) 

but, additionally, {Ws-r o l/{r)s^GS^ is symmetrizable. We provide here two approaches to deal 
with this problem: First, we will use the fact that 2B is non-symmetrizable for transmission of 
a small number of messages that can be read by Eve but, since backwards communication from 
Eve to James is forbidden, are sufficient to counter any of the allowed jamming strategies. 
Second, we will consider a variant of the optimization problem ((4|) where optimization of U^ is 
restricted to maps of the form 17' = Id^U'f ^ and we will prove that these restricted maps are 
asymptotically as good as those that are derived from the original problem when it comes to 
calculating capacity. However, these maps have the additional property that they cannot turn 
a non-symmetrizable AVC into a symmetrizable one. 

Now let r 6 N be arbitrary but fixed and p, Ur as above. Let A:, Z 6 N be such that n = k + l and 
I = [A • nj, where A e (0,1) is arbitrary but fixed for the moment. Then from |22l Lemma 5], 
if K satisfies the assumptions of Lemma [2| with L set to one based on the properties ([65]) . ([66l) 
and (I67p of the lemma. 

So, on the grounds of [2] and of the results proven in [22], we see that for every m' 6 N, r 6 N and 
(5 > 0, p 6 V^'{Ur) (where Ur = [|7b|''] ) and U e C{Ur,X'^) there exists a code K. = (/Cm)m=i 
such that for every s'''™ e (5’’)™ = 5'’'™ we have 


1 

S ^ ^ (220) 
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where {efc}fcgis} c [0,1], limfc_>oo = 0 and it may be assumed that = P. In addition to that 
we know from [38j that there exist codes for (2B, QJ) such that 

inin-^^ 2 ui{x^\a,b)w^i{D'^,,\:x.'^^)>l-6i, (221) 

' I a,b=lx‘eX‘ 

where {6i}ien [0 )l]j lim;_>oo Q = 0, F; = P, Ui e (^([r;] x [K'/],^^) is stochastic pre-coding 
and Da^b Da^v = 0 whenever b ¥= b' (a e [F;] is used as common randomness in [38], whereas 
here we will substitute the messages that were sent on the first k channel uses for it. Note that 
the messages on the first k channel uses are not secure against Eve). In addition to that it holds 

hm y log K'{ = Cs“T(2n, 23) - (222) 

for some arbitrarily small > 0 and 

lim - max max/(.^7; 3s* I = a) = 0. (223) 

l^co I 7e[ri] s*s5* 

The mutual information is evaluated on the random variables defined via 

= ib,z\a)) 2 ui{x^\a,b)v{P\s\x^). (224) 

We concatenate the two codes by defining new stochastic encodings En £ (^([iF"], T”) via 

Ti 

en{{x^, X^)\b) := ^ 6^^{x’")ui{x''\a,b) (225) 

a^l 


and new decoding sets via 


Db := u^D'^ X D'Y cz X^. ( 226 ) 

It holds DbnDb' = ^a,a'{Da X Dafir^Da' X Da',b') = 0- We set Kn := K", an := and /3n := 6i 
for the I satisfying / = [A • nj and the k satisfying k = n — l. Then lim^^oo an = lim„_>oo Pn = 0. 
As a consequence of the Innerproduct Lemma in [2| we know that for every s” = {s^, s^) we have 


1 1 " 

S en{x-\b)w{Db\sPx0> — Y S u{x^\a,b)w{D'Js\x^)w{DY\s\x^) (227) 

b=lx^eX^ ^ a,b=l x^eX^ 

^ 1 - 2max{a„,/3n}. (228) 

That the messages b e [iF„] are also asymptotically secure in the sense that 

lim — max/(.^; 3s") ^ 1™ -r max/(.^7; 3=* |b;) (229) 

n^oo n 1^00 I s*e<S' 

= 0 (230) 
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follows from independence of the distributions of the messages b and the values a of the common 
randomness as described in the inequalities from (jlHIl to (l57t) - Especially inequality (HHI) is valid 
since as a consequence of (j223|] . The rate of the code is calculated as 

i log = A (Cs7“(2IJ, QJ) - z.) . (231) 

Since v can be arbitrarily close to 0 and A can be chosen arbitrarily close to 1 we have proven 
the desired result. 

We now explain the second approach to proving statement 2. in Theorem [21 Here we aim 
to utilize the full power of Lemma [2] with T = 1. Our starting point are the distributions p and 
the channels U arising from the optimization (|3D for fixed r 6 N. Note that, without loss of 
generality, Ur = for every r e N in Q. Set, for every r e N, 

Cr '■= max max min lip] W„ o Ur) — max lip] V^r o Ur)- (232) 

peV{X^) UreC{X^,X^) qeP{SU 

Let r e N be arbitrary but fixed. For an arbitrary e ^ 0, let p and Ur be such that 

Cr — e = min Iip]Wp o Ur) — max. Iip]Vsr-o Ur). (233) 

qeViS'-) s’-eS’’ 

Now define tJr+i by Ur+i{{xi,... ,Xr+i)\ix,u)) := Yix'eX'^riix',X 2 , - - - ,Xr+i)\u)5p,{xi) for all 
x,xi,..., Xr +1 6 X and ueUr = X'^. Then it holds that 


Cr+i^ min Iip®T:]WqoUr+i)— max Iip<^'K]Vsr+i oUr+i) 

qgp(<5r+l) ^ s^+lg^r + l 

^ min lip] Wq o Ur) — max lip] Vgr o Ur) — log \X\ 
qeViS^) s^eS^ 

= Cr-e- log\X\, 


(234) 

(235) 

(236) 


where vr 6 V{X) is defined by tt{x) := for all x e X. This latter estimate is due to 

the equality I{p ® vr; Vgr+i o Ur+i) = I{p] Vgr oU) + I{Tr] the data processing inequality 

and the fact that for arbitrary channels S e C{A x B,C) and T e C{A! x B',C'), as well as 
distributions q e S(B x B') with respective marginal distributions qb e V{B) and qb' £ V{B') 
and p 6 5(^ x A!) with respective marginal distributions pA £ 'P^A) and qA' £ 'P(Al') we have 


V {a,h,c) e Ax B X C : 


EE s{c\a,b)t{c\d .,h')p{a,a)q{bA') = '^qBPAt{c\aA). (237) 

a',c' h,h' h 


Since 2B is non-symmetrizable we know that 211®^ o Ur is non-symmetrizable for every r ^ 2. 
The reason for that is explained as follows: Let again S, T be channels as above. Assume that S 
is symmetrizable but T is not. Then 50T is non-symmetrizable. This can be seen by assuming 
the existence of a symmetrising map Q e C{A x A',B x B'). The statement 

V(ai,02,a'l,02) 6 X Al^ : 

^ s{-\ai,b)t{-\di,b')qib, ^'|a 2 , « 2 ) = Yj b)t{-\a 2 ,b')Qib, 6'|ai) «i) (238) 

b,b' b,b' 

would obviously imply for any fixed choice of ( 01 , 02 ) the statement 

V(a'i, 02 ) e Ax A! : ^ t(-|ai, 5')9 b'(^V2 , 02 ) = XI w'l), (239) 

b' b' 
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where fli) := Yib b'\ai, a'^). This would be in contradiction to non-symmetrizability 

of T. Since Ur = Ur-i ®Id we can thus conclude that SB®’’ o Ur is non-symmetrizable. We now 
proceed with the proof of Theorem [2l 

With this approach we have evaded the problem that 2B®’’ o Ur may well be symmetrizable (see 
our Example d]). 

By |22l Lemma 4] non-symmetrizability of 221®'’ o Ur implies that it is possible to define a decoder 
according to m Definition 3], with N = K ■ L and [N] replaced by [iL] x [L]. Since only the 
number of codewords and their type ever enters the proof it makes no difference whether we 

enumerate them by one index taken from [A^] or by two indices taken from [iL] x [L], This 

decoder is proven to work reliably in [221 Lemma 5] (even with an exponentially fast decrease of 
average error), if = iL • L satisfies the assumptions of Lemma [2] based on the properties (|65l) . 
(f 66 l) and (l67|l of the lemma. 

So, on the grounds of Lemma [2] and of the results proven in [ 22 ], we see that for every m 6 N, 

r 6 N\{1} and (f > 0, p e and U e there exists a code K. = 

such that for every e (5’')'” = we have 

1 1 

l^T- 2 (240) 

k,l=l X™- 

where {cmlmeN [ 0 , 1 ], limm^oo £m = 0 and it holds that 

liminf — log(iLm • Lm) ^ min /(p; IT^ o Ur) — 5 (241) 

m^oo m qGV{S^) ^ 

(the code we use here is defined by using the codewords together with the decoder from 
[ 22 l Definition 3] defined for the AVC 211®’' o Ur := {Wg^ o {Ur-i 0 Id))sres^) and 

max I(p: Vgr o Ur) + 26 ^ liminf — logL^ ^ max I(p: Vg-r o Ur) + 6, (242) 

s'^eS^ m^co m s^eS^ 

implying that for a sequence (pm)msN of choices for pm converging to some p having a decom¬ 
position p = p' 0 7r for p' 6 V{X^~^) being an optimal choice in the sense of (|232p we get 

liminf — logiLm ^ min /(p; Wq o [7^) — max I(p; l^r o LL) — 3(f (243) 

m^oo m qeV{S^) s^eS^ 

^ a_i-log|T|-3(5. (244) 

Also, it is clear from the last part of Lemma [2] (equation (1681) 1 together with |38l Lemma 20] 
that the codes employed here are asymptotically secure in the strong sense: 

limsupmax/(.^; 3s7-m) = 0. (245) 

m^oo ^ 

We now wish to apply the code for the extended channel (2II®’’ o Ur,^U^^) to the original channel 
(211,21). Define values e {0,... ,r — 1} by requiring n = m ■ r + for them to hold for some 
suitably chosen m = m{n) e N. This quantity satisfies -1-1- n/r ^ m{n) ^ n/r. For every n e N 
we then define new decoding sets by 

Dki ■=DkiX y- (246) 
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and new randomized encodings by setting for some arbitrary but fixed 

L 

L' 


^ 1 

E{x^\k) := 2 ■ 5,t4x^-). 


(247) 


1=1 


From the choice of codewords and the decoding rule it is clear that this code is asymptoti¬ 
cally reliable. The asymptotic number of codewords (mind that Kn = Km(n)) calculated and 
normalized with respect to n, is 

1 


lim inf — log = lim inf 
n^co Tl n^cc 

^ lim inf 


= lim inf 

n^co 


= — lim inf 


r — 1 


/ \ . , ^m(n) 

,(n) ^ ^ 

(248) 

/ \ , 1 ^m{n) 

m[n) -|- 1 ^ ^ 

(249) 

1 m(n) 

/ \ / \ , 1 -^m(n) 

m[n) m{n) + 1 ^ ^ 

(250) 

1 

/ \ ^m(n) 

m[n) ^ ' 

(251) 

-35) 

(252) 

— (a-i-logl-Tl -35). 

(253) 


In addition to that, the code is secure: For each n e N, the distribution of the input codewords 
and Eve’s outputs is 


P(J^ = A:,3s- = z^) 

^ 1 
1^1 

= P(i7. = = 7'™) • n®*"(7"|x*",s‘-). (255) 

This demonstrates that (uniformly in s"" e 5” and since ^ holds) we have 

3s'-'-) + 0 = 3s'-’-)- (256) 

Since the right hand side of above equation goes to zero for n going to infinity and since 
limr .^00 = 1 we see that the capacity Cs is lower bounded by limr^oo It is not an 

immediate consequence that this implies we can reach the capacity QJ) = C'*( 2 n, QJ). 

Fortunately it has been proven in |38j that 

(^^(SIl, 2J)= lim - max max ( min /(p; 114 o 17) — max/(»; Vijr o 17) | (257) 

^ ^ --00 r per{U„) U„eC{U,X^) \qeV{sn ^ s-eS- ^ ^ ^ 

holds. Thus liiUr^oo \Cr = C'*(2I1, TJ). This finally implies the desired result. □ 

Proof of TheoremlB If ( 33 ( 211 , 21 ) = 0 , there is nothing to prove. Assume that (3s(211, 2 J) > 0 . 
It is evident that, in that case, 221 is not symmetrizable. The function F defined in Definition 
M is continuous with respect to the Hausdorff distance (proving this statement is in complete 
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analogy as the corresponding part in the proof of Theorem 5 in |15]). Thus, if F(W) > 0, then 
there is an e > 0 such that for all 2B' satisfying d(2B, 211') < e we know that F{W) > 0 as well. 
Thus, every of these 211' is non-symmetrizable. 

For some suitably chosen e' < e we additionally know from Theorem 9 in [38] that 
C'™®|;^(211', 23) > 0 for all those 2B' for which <3(213,211') < e'. But since Theorem [T] shows 
that F(223') > 0 ^ Cs(2n',23) = this implies that 

Cs(213,23) > 0 V 213' : d{W, 213') < e'. (258) 

Since from Theorem[3]we know that positivity of (73(213', 23) ensures that it equals (7™®f'^ (213', 23), 
and since the latter is continuous, we are done. □ 

Proof of Theorem Again, we prove everything in the same order as it is listed in the theorem. 

1. Let Cs be discontinuous in the point (213,23). By Theorem [3] we know that this can 

only be the case if (7s(213,23) = 0. If in addition we have (7™®™(213,23) = 0 then we have, since 
^™ran continuous, that for every e > 0 there is <5 > 0 such that for all (2135,23) satisfying 
(3(2135,213) < <5 we have (7™®“(2Il5,23) ^ e. Since since ^ Cs this would imply that Cs is 

continuous as well, in contradiction to the assumption. Thus C™®“(211,23) > 0. Of course this 
immediately implies that 213 has to be symmetrizable, by property 2. This is, in turn, equivalent 
to 3^(213) = 0. The definition of F can be picked up from equation (I58p . its connection to 
symmetrizability is obvious from the definition. The notion of symmetrizability is explained in 
the introduction in equation ([3|). Clearly, if for all e > 0 and 213' satisfying <i(213,223') < e we 
would have F(213') = 0, then Cs(223',23') would be zero in a whole vicinity of (223,23). Thus for 
all e > 0 there has to be at least one 223e such that (3(223,213e) < e but F(223e) > 0. 

The reverse direction is basically established by using all our arguments backwards: For all 
e > 0, let there be at least one such that (3(223,213e) < e but F(223e) > 0. Let in addition to 

that F(213) = 0 but C™®™(213,23) > 0. Since C™®™ is continuous, there is a <5 > 0 such that 
C^®^°(223',23') > (1/2) (213,23) =: a whenever (3((213,23), (213', 23')) < 6 . 

For every e' ^ (1/2) min{e, 5} we can therefore deduce the following: It holds that 

Cs(223,/,23) = C^/“(213eps',53) ^ a > 0 (since T(213,/) > 0), but Cs(223o,23) = 0. Thus 
Cs is discontinuous in the point (213,23). 

2. Let Cs be discontinuous in the point (213,23). By property 4 this implies that for all 

e > 0 there is 223£ such that (3(213,223e) < e but T(223£) >0. If TJ is such that C™/“(223, TJ) > 0 
then the pair (223,23) fulfills all the points in the second of the two equivalent formulations in 
statement 4, and this implies that Cs is discontinuous in the point (223,23). □ 

4.7 Proof of Lemma [2] 

Proof of Lemma [21 The proof is in many ways similar to the one for Lemma [TJ As we know 
already that for some c' > 0 and all large enough n 6 N 

n E 4 n E^) ^ 1 — F • exp(2“”'® ) (259) 

holds from [22], there is not much left to prove, as only P(Ci) needs to be controlled in order to 
get statement (|68l) of Lemma [2j We know from Lemma [6| that both 

F{eI) ^ 2 • IA X 5 X ZI" • exp(-2'''^/2)^ (260) 
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if we choose 6 = S{t). Keeping in mind that we already know from [22] that n E 4 ^n E^) ^ 
1 — r • exp(2’^''^) we can combine all the previous to get the statement 

P(K 3 n ... n Ki) ^ 1 - (2 + r) • exp(-2”'^"), (261) 

for some c" > 0 and for all large enough n. If F scales at most exponentially there will thus exist 
A^o ^ N such that for all Nq there exists a choice x = satisfying all conditions 

in Lemma [2] and, in addition, the estimate 

. i,r 

V z^,k-. YErY, i [(1 + (262) 

/,7^1 

That this leads to secure transmission is proven exactly as in the proof of Lemma [T] The Lemma 
is thus proven. □ 

4.8 Proof of Theorem [5] (super-activation results) 

We will divide this proof into three parts, each corresponding to its counterpart in Theorem O 

Proof. 1. Let us start with the “only if” statement. Clearly, if SUi 02112 is symmetrizable then 
( 73(2211 0 2Il2,21i 02 I 2 ) = 0. So, this part of the statement is proven. 

If, on the other hand, 2211 0 2212 is not symmetrizable and (7“®|;(‘(221i 022l2,21i 02 I 2 ) > 0 then 
on account of Theorem [JJ statement 1, we know that (7s(221i 022l2,21i 02 I 2 ) > 0. 

This proves the first part of the Theorem. 

2. In [T6], Section VI, an explicit example of a pair has been given with 

the property that 221i is symmetrizable, but 2212 is not. By elementary calculus, this implies 
that 22 I 1 0 22 I 2 is non-symmetrizable. 

Since this holds, our Theorem [2l statement 1, shows that the uncorrelated capacity of 
( 22 I 1 0 22 l 2 , 21 i 02 I 2 ) equals its randomness-assisted capacity. 

In [T6| it was further shown that (7™®“(221i,21i) > 0 and (7s(221i,21^) = 0 (i = 1,2). 

3. By assumption, C^^\7(221i,21*) = 0 (z = 1,2) but C^,\7(221i 0 21i,2212 0 2 I 2 ) > 0. 
The former implies (7s(221j, 21*) = 0 (z = 1,2). If 221i and 2212 were symmetrizable then clearly 
22 I 1 0 22 I 2 would be symmetrizable and by |27| the message transmission capacity of 221i 0 2212 
would be zero, implying (7s(221i 022l2,21i 02 I 2 ) = 0. If on the other hand either 221i or 2212 are 
not symmetrizable then 221i 0 2212 is not symmetrizable and this implies 

(7s(221i 0 22 I 2 , 2 I 1 0 2 I 2 ) = (7^r"an (2Bi 0 2212 ,21i 0 2 I 2 ) > 0, (263) 

where the equality is due to Theorem [2l part 1, and the lower bound is true by assumption. 

4. We do again rely on Theorem [2j Let both 221i and 2212 be symmetrizable. Then 
221i 0 22 I 2 is symmetrizable. Since by assumption (7™®™ shows no super-activation on the pair 
(221j,21j) (z = 1,2) it follows that Cs cannot show super-activation as well. Thus at least one of 
the two AVCs has to be non-symmetrizable. Let without loss of generality this channel be 2Ili. 
If in addition 2212 would be non-symmetrizable, then (7s(221j, 21*) = (7™®|;(‘(221*, 21*) would hold 
for z = 1,2 and since 221i 0 2212 would be symmetrizable as well, we would additionally have 
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Cs( 2 ITi 02 IT 2 , 5 Ji 0 QJ 2 ) = C'™®“( 2 ITi 02 n 2 , 2 Ji 0532 )- But since shows no super-activation 

on the pair {Wi,^i) {i = 1,2) this cannot be. Thus again without loss of generality we have 
2112 is symmetrizable. 

Since we are talking about super-activation of Cs, it has to be that ( 73 ( 2114 ,2Ji) = 0 holds for 
i = 1,2. But since 2Ili is non-symmetrizable this requires that (7™®“(2Ili, 2Ji) = 0 holds. If in 
addition we would have ( 7 ™®™( 2 Il 2 , 2 J 2 ) = 0 would hold than (7s could not be super-activated 
since (7™®™ cannot be super-activated by assumption. Thus ( 7 “®|^( 2 Il 2 , 2 J 2 ) >0. □ 

4.9 Proof of Lemma [3] 

We now prove Lemma [3) First and without loss of generality, we have A ^ ^ . Let it be 
symmetrizable. Let Q e C{A,TZ) be the symmetrizing channel, meaning that for all a,a' e A 
the equality 


{U o [Id 0 (5)) (a, a) = [U o [Id 0 Q)) [a , a) (264) 

holds true. It follows that for all a, a' e Al it holds that 

[U o [T (S) QT)) [a,a) = ^ ^ tt(-|a'^, r)t(a'^|a)g(r|a''^)t(a'''|a') (265) 

a" ,a'"eA reTZ 

= ^ ^ u(-|a"', r)t(a"|a)( 7 (r|a")t(a"'|a') (266) 

a” tgIZ 

= [Uo[T0QT))[a',a). (267) 


Thus, il' is symmetrizable. 


5 Appendix (auxiliary results and proofs) 

Lemma 8 (Cf. [H]). Let p e V[X). For every n ^ iTp, there is p' e Vq[X) such that 


Ip- p 111 


n 


(268) 


and p[x) = 0 implies p'[x) = 0 for all x e X. 

Proof of Lemma 0 Let n e N be arbitrary. Set X' := {x e X : p[x) > 0}. From the next lines 
it will follow that, without loss of generality, we may assume X = X'. For sake of simplicity, 
assume again without loss of generality that X = {1,...,|T|} and that p(|<T|) ^ 1/|T|. Choose 
p'(i), for i = 1,..., |T| — 1, such that \p'[i) —p(*)| ^ Clearly, this is possible. Then necessarily 


47 



p'{\x\) = 1 - Xilfl ^p'{i) and 




1 




^ n 

|A’| - 1 

n 

lA’I - 1 


< 


n 
2\X\ 
n 


|K(|t’|)-KI-^I)I 

(269) 

lA’i-i 

1 X p(^)-pWI 

(270) 

i=l 

X \pii)-p'ii)\ 

(271) 

i=l 

(272) 


Of course, while all the p'{i) ^ 0 by construction if i < Id:’!, this does not hold for p'{\X\). This 
is where we need the additional condition that n ^ ITP: 


l^l-i 

p'{\x\) - 1 - 2 p'W 

i=l 

\X\-1 

^ 1 - X! “ 

i=l 

^ KIT’D 

n 

1 IT”! 

^ 0 . 


Tl - 1 

n 


(273) 

(274) 

(275) 

(276) 

(277) 


□ 


Lemma 9 (C.f. |l9j ). Let oL' e A"" and e . There exists a function /c : N ^ M+ such 
that with AB being distributed as F{{A,B) = (a, 6)) = ^N{a,b\d^,b^) we have 

|{a’^ : iV(-|a”K'') = 6”)}| = (278) 


The function fc satisfies limn,^oo fcip) = 0- 

The following Lemma is basically taken from m- It would generally be completely sufficient 
for proving all our statements in sufficient generality. 

Lemma 10. Let D{p\\q) ^ 5. For the function fi : [0,1/2] ^ M+ defined by fi{x) := 
x/2\og{x\Z\^) we have that 


\H{p)-H{q)\^m. 


(279) 


Clearly, lim^^^o fii^) = 0- 

Note that p{x) = 0 implies p'{x\s) = 0 for all s e 5, by construction. 
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Proof. From Pinsker’s inequality we have ||p — g||i ^ and, accordingly, by Lemma 2.7 in 
[20], \H{p) - H{q)\ ^ -v^log(v^/|Z|). □ 


We did however feel that it would be interesting to use a slightly more general version of 
Lemma fTOl which led us to prove the following Lemma: 

Lemma 11 (Continuity of conditional entropy with respect to averaged norm). Let p e V{X) 
and channels w,r : V[X) 'P(^) he given such that 

p{x)\\w{-\x) — r(-|x)||i ^ ^ 1. (280) 

xeX 

Then 

\H{w\p)-H{r\p)\^h{5), (281) 

where fi{5) := jZj ■ h(j^). 

Proof of LemmaML As in [20], set z^(t) := —tlogt and observe that n is concave and satisfies 
z/(0) = z/(l) = 0. This brings with it the property that for all s, A e [0,1] we have 

z^(A • a) ^ A • z^(a), u{X • a + 1 — A) ^ A • z^(a). (282) 


We wish to obtain a meaningful bound on \n{s) — n{t)\ in terms of |s — t|. To this end, assume 
without loss of generality that s ^ t. Observe that this implies that |t — s| = t — s, so that both 

i>{\t — s|) + i>{s) = u{t ■ —^—) + ^{t ■ -) (283) 

^ • u{t) + ^ • n{t) (284) 

= n{t) (285) 

and with A := satisfying 0 ^ A ^ 1 we have 


z^(l — |t — s|) + iy{t) = u{X • s + 1 — A) + n{X + (1 — A) • s) (286) 

^ Ai^(s) + (1 — A)z^(s) (287) 

= (288) 

so that in total we get for every two number s,t e [0,1]: 

\i'{t) — iy{s)\ ^ max{z^(|t — s|), u{l — \t — s|)} (289) 

^ v^t — s|) + z/(l — \t — s|) (290) 

= h{\t-s\) (291) 


where h denotes the binary entropy. Then for every {ex)xeX e [—1, and {tx)xeX e [0, Ill'll 
such that tx + €x ^ [ 0 , 1 ] for all x 6 A we get: 

I ^ p{x){n{tx) - v{tx + ex))| ^ 2 p{x)\v{tx) - n{tx + ea;)| (292) 

xeX xeX 

^ 2 P(®)^(l^^l) (293) 

xeX 

^h{YiP{x)\ex\). (294) 

xeX 
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Then, we write txz '■= w{z\x) and exz '■= —w{z\x)+r{z\x). This leads to the bound we ultimately 
need: 


2 p{x)H{w{-\x)) -H(r{-\x))\ = 12 2 p{x){u{w{z\x)) — i'{r{z\x)))\ (295) 

xeX zeZ xeX 

^ 2 I - J^{txz + (^xz)\ (296) 

zeZ xeX 

zeZ xeX 

^ 1^1 ■ 2 S (298) 

I I xeX zeZ 

= IZI ■ h{^5) (299) 


□ 
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